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PREFACE 


As in the 1949 and 1955 editions, the aim of this edition is to provide a 
concise, integrated coverage of the statistical techniques most frequently 
needed in psychology and the other behavioral sciences. The stress 
continues to be on interpretations and assumptions rather than on compu- 
tations, and the level although designed for an intermediate course is not 
beyond the grasp of students in elementary courses who are unafraid of 
mathematical reasoning. 

In preparing the third edition, 1 have made numerous minor revisions 
and insertions throughout. I have shifted to a и and M (and X) notation 
for population and sample means, and to a c, S, and s notation for stand- 
ard deviations (or variances). Although mathematical statisticians seem 
to get along with o and s as population and estimated values, I still feel 
that the ordinary standard deviation, or maximum likelihood estimator 
(herein symbolized as S but for years symbolized as o in many textbooks), 
is definitely needed. Another notational change simplifies the writing of 
models and expected values in the analysis of variance. 

The expected values of variance estimates in the analysis of variance 
have been revised to conform with the resolution of a controversy among 
mathematical statisticians, an outcome that outmoded “bits” of the 1955 
edition while in press. 

Principal extensions are a chapter on trend analysis and exercises beyond 
those needed for a course in elementary statistics. To avoid exercises 
calling for either derivations or computations, an attempt has been made 
to provide “thought” questions of challenging difficulty. 

Other major additions include the following: the development of the 
error formula for the difference between independent proportions; the 
empirical work on the effect of assumption violations on the ¢ and F tests; 
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significance tests for regression coefficients; a greatly extended discussion 
of the effects of errors of measurement (prompted, in part, by the frequent 
use of homemade, likely unreliable, psychological tests); a brief presenta- 
tion of an old topic, the correlation of sums; a simple, though ancient, 
derivation of the standard error of the meaa formula; a proof that 5? is 
unbiased; a derivation showing the connection between variance and 
%2, and the deduction of the 3, t, and 2 (critical ratio) tests from the F test; 
an algebraic determination of the expected values of variance estimates 
in one-way analysis of variance; more on reliability by way of analysis 
of variance; the connection between the concepts of interaction and 
correlation; a further explication of Latin square usage in psychological 
research; and additional nonparametric techniques, along with a discus- 
sion which indicates that I disagree with one of the reasons advocated for 
the usage of nonparametric methods. 

No attempt will be made here to disentangle and acknowledge all of 
the factors, personal and otherwise, that have influenced the writing and 
revising of this book. I would, however, be remiss if I did not mention 
the early impact of two of my teachers, the late Truman L. Kelley and 
Harold Hotelling. As was true for the first two editions, my greatest 
personal indebtedness is again to Olga W. McNemar for her critical 
acumen and sympathetic help in the struggle for clarity of exposition. 

Iam grateful to Professor Ronald A. Fisher and Dr. Frank Yates, also 
to Messrs. Oliver and Boyd Limited, Edinburgh, for permission to reprint 
Tables Ш, IV, V, and VII from their book Statistical Tables for Biological, 
Agricultural and Medical Research. ` 
QuiNN MCNEMAR 
Palo Alto, California 
February, 1962 
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Chapter 1 
INTRODUCTION 


Statistical methods are concerned with the reducing of either large or 
small masses of data to a few convenient descriptive terms and with the 
drawing of inferences therefrom. The data are collected by any of several 
methods of research with the aid of measuring devices appropriate to a 
given area of investigation. The research methods are variously named and 
classified. Thus in psychology we have methods which are labeled experi- 
mental, clinical, observational, etc. The devices for measuring or securing 
responses vary from those which involve delicate apparatus through 
paper-and-pencil schemes to controlled observations and interviews. 
Statistical techniques are not to be considered as coordinate either with 
research methods or with devices for obtaining and recording responses, 
but rather as tools for analyzing data collected by whatever means. 

The reduction of a batch of data to a few descriptive measures is the part 
of statistical analysis which should lead to a better over-all comprehension 
of the data. All readers will be more or less familiar with the concept of 
average. An average is a measure which describes what is typical of a 
group with respect to some trait, characteristic, or variable. If we are 
comparing two or more groups, the determination of an average for each 
group permits a better appraisal of possible group differences than would 
be obtained by casual examination of the data. There are various statistical 
measures, or types of averages, which have proven useful as descriptive 
terms for a variety of data. One aim of this book is to present and discuss 
the descriptive statistical measures most frequently needed in psychological 
research. Proper usage and interpretation of these terms and evaluation 
of their use by others are not possible without knowledge of their meaning 
and their limiting assumptions. Incidentally, the user of statistical measures 
must give some thought to computational procedures. 
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As we proceed, it will be necessary not only to define descriptive 
measures but also to distinguish between the usage of a given measure as 
being descriptive of a sample as opposed to a population. Since sample 
descriptive statistics are knowns (i.e., coraputable) whereas the correspond- 
ing population values are unknowns (but estimable), we will in this book 
define and discuss the descriptive measures in terms of samples and sub- 
sequently consider the problem of drawing inferences about, or estimating, 
population values. Sample values are frequently referred to as statistics 
and population values are called parameters. | 

That part of statistical analysis which has to do with the drawing of 
inferences is imposed on us because of certain inadequacies of research 
data. For instance, an investigator who wishes to know the average 
height of adult women in the United States will never have facilities for 
measuring every woman. Accordingly, he is compelled to measure a 
sample of women; then on the basis of information yielded by the sample 
he can make an inference concerning the average height of the population 
of women. Another investigator, wishing to evaluate the relative merits of 
two learning methods, tries out the methods with two small groups of 
students, and from the results, makes an inference concerning what might 
be expected if he had facilities for working with very large groups. An 
opinion poller may seek information about the reactions of Republicans 
and Democrats to some world event. By questioning a sample of each 
group he can secure sufficient data for drawing an inference regarding a 
possible difference between the population of Republicans and the popula- 
tion of Democrats. 

The problem of statistical inference is usually that of determining whether 
statistical significance can be attached to results after due allowance is 
made for known sources of error. There are many and varied situations 
for which we need tests of significance, and accordingly several tests are 
available. Intelligent and critical inferences cannot be made by those who 
do not understand the purposes, assumptions, and applicability of the 
various techniques for judging significance. 

It is in connection with the problem of drawing inferences that a 
knowledge of statistical methods is most helpful. A research should be 
planned in such a way that the resulting data are amenable to treatment by 
the available statistical techniques. With sufficient information concerning 
these techniques of analysis, one should be able to lay outinadvanceof data 
collecting the main types of statistical analysis to be used. If a proposed 
experimental setup precludes the possibility of adequate analysis, it may 
be found that a slight alteration in the plan will remedy the situation. All 
too frequently the statistician is called in to help with data which have not 
been collected in such a manner as to permit efficient analysis. Only by 
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knowing the available methods of analysis can one plan a research with 
assurance that the results can be handled statistically. 

Another reason for keeping in mind statistical considerations while 
planning a research is the fact that some experimental designs are prefer- 
able because they permit, with small additional cost, or even at a saving, 
better control of error than óther plans. Indeed, certain designs lead to a 
marked reduction in known sources of error. 

A third reason for planning with foresight regarding the statistical 
analysis is that a set of data can sometimes be made to serve for checking 
several different hypotheses. 

The student should be warned that he cannot expect miracles to be 
wrought by the use of statistical tools. Although statistical methods have 
an important place in present-day psychological research, it does not 
follow that they can be utilized to salvage data that result from a hap- 
hazardly planned and sloppily executed investigation. No amount of 
statistical juggling can transfigure bad data into acceptable form. It is 
doubtful whether the student who comes to the statistician with a batch of 
data and the question, “Can I compute a correlation coefficient... 2" 
will make a scientific contribution, but such a student deserves sympathy, 
especially if his major advisor has suggested that he need not worry about 
statistics until he has collected data. 

The purpose of the present book is to acquaint the student with the 
statistical techniques commonly used, to suggest economical computa- 
tional procedures, and to state the assumptions and limitations of the 
various techniques. Whenever the understanding of a particular tech- 
nique can be clarified by a simple derivation, such a derivation will be 
given. Unfortunately, many of the derivations are too complicated 
mathematically to permit consideration in an elementary or intermediate 
treatment. The qualified and interested student will find some of these 
derivations in more advanced textbooks and others in original sources. 

Statistical methods belong in the realm of applied mathematics, and 
consequently extensive scholarship in mathematics is required of those 
who choose to specialize in statistics. It is possible, however, to secure a 
practical working knowledge of statistical techniques without first becom- 
ing a mathematician, provided the deficiency in mathematics is not 
accompanied by an emotional reaction to symbols. 

Within the realm of psychological research there is wide variation in the 
need for statistical procedures. We can find current research reports which 
involve no use of statistics, some which involve very simple statistical 
treatment, still others which lean heavily on the tools of statistics, and a few 
which are highly statistical. We need not shift from one area of investiga- 
tion to another to find this variation, but it is true that certain areas of 
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research in psychology have less dependency than others on statistical 
procedures. The area of psychology which seems the most dependent on 
statistics is psychological measurement. This dependency is due mainly to 
the very nature of psychological measurement, the theory of which is 
largely statistical. ° 

The presence or absence of statistical analysis per se is not a safe 
criterion for judging the worth of a study—some studies would have been 
improved by the utilization of statistics, whereas others would be better if 
they had been so designed as to depend less on statistical analysis. Except 
for the requirement that the statistical analysis be adequate, there are no 
general rules as to how statistical a research should be. Of two experi- 
mental plans, either of which would provide appropriate data for checking 
a given hypothesis or sets of hypotheses, that plan which calls for simple 
statistical analysis is certainly preferable to the one which requires elabor- 
ate analysis. Experimental control of errors is far better than statistical 
adjustments. 


| Chapter 2 


TABULAR AND GRAPHIC 
METHODS 


When we are faced with a mass of data, the first manipulative step is 
tabulation or classification. If we are dealing with the number of children 
per family, the tabulation is equivalent to counting the number of one-child 
families, two-child families, etc.; or if we have information on 1000 
persons regarding their national origin, we can tabulate, or count, the 
number of those of German, French, Italian, etc., origin; or these same 
individuals can be classified as to eye color. If we have their heights, we 
can also classify (or tabulate) them as being 58, 59, 60, etc., inches in height, 
and if the shortest person is 58 and the tallest is 78 inches, we would 
tabulate our 1000 into 21 different inch groups. If we also know the 
weights of these individuals, we can classify again, this time as 100, 101, 
up to (say) 229 pounds, and thereby have 130 groups. In all these situations 
we can classify with respect to the given characteristics, but the resulting 
tabulations will show marked differences as we pass from trait to trait. 
For instance, we may have only six national groups, and it will make little 
difference whether Germans or Russians are first on the tabulation sheet. 
Such a characteristic as nationality or eye color is said to be unordered (and 
somewhat discrete). The number of children per family is discrete but can 
be ordered, from least to greatest number. Such a trait as height can also 
be ordered, but it is said to be continuous (nondiscrete) because it is possible 
to have an infinite number of in-between values very closely spaced. Such 
a series is sometimes called graduated. It will of course be obvious that a 
discrete series does not permit of in-between values, e.g., no family can 
have 21 children. 

For most purposes it is adequate if we tabulate, or classify, individuals 
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into certain large groups. For example, instead of classifying our 1000 
persons into pound groups (130 such groups) it is usually sufficient to 
classify them into broader groups, say 100-109, 110-119, etc., thereby 
obtaining 13 large groups. As a matter of fact, the use of fewer groups has a 
distinct advantage in that the labor of tabulating and computing descriptive 
terms is greatly lessened. The factors influencing the choice of the grouping 
interval are two: first, its size should be such as to permit at least 10 or 12; 
but not more than 20, classes or groups; and second, it should promote 
tabulating convenience. Suggestions for choosing tabulating intervals are: 


Table 2.1. Frequency distribution of IQs for 161 five-year-old boys 


Interval f Smoothed f Cumulative f 
160-169 1 3 161 
150-159 1.3 160 
140-149 3 4.0 160 
130-139 9 13.7 157 
120-129 29 25.7 148 
110-119 39 34.3 119 
100-109 35 35.3 80 
90—99 32 25.0 45 
80-89 8 14.0 13 
70-79 2 nid 5 
60-69 1 13 3 
50-59 1 1.0 2 
40-49 1 „1 1 


(1) determine the range of measures ог scores, i.e., the difference between 
the lowest and highest; (2) by inspection determine whether the range can 
be divided into 12 to 20 equal intervals of some convenient size, say 5 or 
10; and (3) let the lower number of each interval be a multiple of the 
size of the interval. It is customary to arrange the tabulation sheet with 
the highest or largest values of the variable at the top and to use either 
dots or tally marks when tabulating. The tallies per interval can be 
counted and recorded to the right of the tally marks. This column is 
usually labeled f, and the sum of the fs will be N, or the total number of 
individuals in all the grouping intervals. Tabulation results in a frequency 
table or frequency distribution, such as that shown in the first two columns 
of Table 2.1. 

It should be noted that the expressed interval limits in a frequency table 
are not necessarily the actual limits. Thus, if weight has been taken to the 
nearest pound, the actual limits of the interval 130-139 would be 129.5 
and 139.5; but if the ages of individuals have been taken as at the last 
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birthday, the interval 20-24 would have actual limits of 20 and 24.999+. 
Obviously for purposes of tabulation we need not use the implied actual 
limits, and for computational purposes we usually need either the lower 
limit or the midpoint of certain intervals, so there is nothing to be gained 
by meticulously labeling the intervals with actual limits. 


GRAPHIC PRESENTATION 


If we scrutinize the tally marks or the frequency table, we can obtain 
some notion as to how the individual values are distributed. A number of 
pictorial schemes have been suggested as aids in the study of frequency 
distributions. It is possible to lay off the various values (or intervals) of 
the variable on the horizontal or х axis, and to let the vertical or y axis 
represent the frequency per value or interval. The frequencies of the 
several intervals can be represented by drawing a horizontal line across 
each interval at the height corresponding to the number of cases in that 
interval, and then connecting these horizontals with verticals erected at the 
interval limits. This yields a histogram (Fig. 2.1). Using the same arrange- 
ment of the vertical and horizontal scales, we can merely indicate the 
frequency with a dot or cross placed directly above the midpoint of the 
interval, and then connect the adjacent points with straight lines. This 
results in a frequency polygon (Fig. 2.2). Such a polygon or the corre- 
sponding histogram will usually show irregularities; on the assumption 
that these are due to the operation of chance, we can draw a smooth curve, 
cutting as near the points as possible, and this curve can be thought of as 
giving a better picture than the original polygon. A curve which is 
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Fig. 2.1. Histogram for data of Table 2.1. 
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Fig. 2.2. Frequency polygon for data of Table 2.1. 


obtained by freehand drawing or by graphic smoothing schemes or by 
repeated smoothing of the frequencies by a method of moving averages is 
known as a frequency curve. One method of moving averages is illustrated 
in Table 2.1, in which an average is taken over three intervals. The 
smoothed value for an interval is obtained by summing the frequencies in 
that interval and the two adjacent intervals and dividing by 3. Thus the 
smoothed value for the interval 80-89 is equal to the sum of the frequencies 
2, 8, and 32, divided by 3. For the 90 interval, 8, 32, and 35 are summed 
and divided by 3. The student should plot both the original and smoothed 
frequencies so as to compare the two graphs. 

Although it is relatively easy to depict a frequency distribution by a 
histogram, by a frequency polygon, or by a smoothed frequency curve, 
it is necessary that we note a shift in interpretation as we pass from the 
histogram to the polygon to the curve. In drawing the histogram, we 
are in effect drawing a series of vertical bars with a common boundary for 
any two that are adjacent to each other. Since the height of each bar 
represents a frequency, we may, by arbitrarily assigning unity as the width 
of each bar, say that the area of a bar also represents a frequency. Then 


the sum of the areas of the several bars will be the total number of cases, 
or N. 


If we think of the polygon in Fig. 2.2 as being superimposed on the 
histogram of Fig. 2.1 and imagine that the common boundaries of the 
vertical bars have been erased, we will have a picture like that in Fig. 2.3, 
in which the remaining parts of the bars have the appearance of an up and 
then down irregular staircase. A little thought should convince the reader 
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that the total area under this staircase is N, or precisely the same as the 
sum of the areas of all the bars. 

Next consider the polygon. Note that as we pass from interval to 
interval, the polygon in conjunction with the staircase histogram forms a 
series of pairs of equal-area triangles. One of each pair is an area included 
under the polygon but not under the histogram, whereas the other is an 
area included under the histogram but not under the polygon. The net 
effect of this balancing of areas, in and out, is that the total areas under the 
polygon and histogram are equal; each total area represents N. 

Now it should not stretch our imagination too much to regard the total 
area under a smoothed polygon or under a frequency curve as being equal 
to N. With this notion that area, not height, represents frequency, we can 
readily speak of the area under the curve between ordinates erected at any 
two score values on the base line (x axis) as the number of cases between 
the two score points. And of course the area under any part of the curve 
could be expressed as a proportion or a percentage of the total area. 

This concept of area as frequency will have considerable value for us as 
a basis for interpreting certain statistical measures, and the concept will be 
indispensable to our understanding of certain "ideal," or mathematical, 
frequency curves, as yet undefined. 

Another type of graph can be obtained by the use of cumulative fre- 
quencies. In Table 2.1 is a column headed "Cumulative f." These values 
are obtained by successive adding of the frequencies, beginning with the 
lowest interval. Adding | and 1 gives 2, adding to this the next frequency 
gives 3, to which in turn is added the next, giving 5, and so on until we have 
160 plus 1 for the last cumulative value, which is the total number of cases. 
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39 79 89 99 109 119 129 139 149 159 169 179 
MN б w^ 80 90 100 110 120 130 140 150 160 170 


Fig. 2.3. Frequency polygon superimposed on histogram. 
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Fig. 2.4. Ogive for data of Table 2.1. 


Obviously, from the cumulative table we can tell how many individuals 
fall below a given point. If we plot the cumulative values and connect the 
plotted points, an ogive curve results (Fig. 2.4). Note that, in plotting the 
cumulative frequencies, we do not use the midpoint of the interval, but 
rather the upper boundary. Why? 

The use of frequency polygons in the comparison of two groups is quite 
simple and often very enlightening. All that is necessary is to plot the data 
for both groups on the same sheet and with reference to the same axes. If 
the number of cases in the two groups differs markedly, a better com- 
parison can be obtained by converting the frequencies for each group to 
percentages of the total number in each group. Polygons based on per- 
centage frequencies will not portray differences which are merely a reflection 
of differing Ns and therefore are more comparable. A glance at two such 
frequency polygons will reveal whether the two groups show marked 
differences in the trait in question or to what extent the two distributions 
overlap. More refined methods for comparing groups are discussed later. 

When we wish to picture a discrete series, it is customary to use either 
horizontal or vertical bars, separated from each other, to represent the 
several frequencies. As in the case of frequency polygons and histograms, 
there are no hard and fast rules regarding the heights (or lengths) of the 
bars relative to the horizontal (or vertical) base. The student should 
attempt to avoid extreme lack of proportion. Newspapers and magazines 
often represent frequencies as areas or solids. A circular diagram, or 
pie chart, in which the sizes of the separate sectors represent the percentage 
falling into given groups or classes is sometimes used to picture relative 
frequencies. There is some evidence, and a general consensus of opinion, 
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that some type of linear graph is less likely to be misinterpreted than one 
that depends on areas or solids. 

Another type of graphical representation is used to picture the relation- 
ship between two variables, e.g. growth in stature and age, or price 
change with year. To make such a line graph, we can lay off time or age or 
trials on the horizontal axis, choose a convenient scale on the y axis for the 
other variable, and then plot the observational values. The line graph 
should be arranged so that the graph is read from left to right and from the 
bottom to the top, and the scales on the two axes should allow the inclusion 
of all observed values of the two variables and at the same time permit of a 
well-balanced or well-proportioned picture. A line graph can be made 
misleading by the choice of the scales on the two axes. For instance, if we 
are plotting the practice curve for card sorting (number of cards sorted on 
y axis, trial number on = axis), it is possible to make a tremendous differ- 
ence in the appearance of the graph simply by altering the scale on the y 
axis. Of two curves which represent the same relationship, one (Fig. 2.5) 
would give the impression that the learning had progressed quite rapidly, 
whereas the other (Fig. 2.6) would lead us to think that progress was slow. 
The student will do well to develop a healthy scepticism of all graphs he 
encounters for the simple reason that either scale can be so selected as to 
lead to gross misinterpretation. 

It should be noted that smoothing may be applied to line graphs as well 
as to frequency polygons. Often, if a line graph is smoothed, the relation- 
Ship between the two variables can be more adequately characterized. 
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Fig. 2.5. Learning curve (same data) 
as Fig. 2.6.) 


Fig. 2.6. Learning curve (same data 
as Fig. 2.5.) 
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Smoothing out the irregularities helps us to see whether the relationship 
is linear or logarithmic or parabolic or of some other common type. 
Frequently a verbal description of a curve will aid in understanding 
something of the functional relatedness of the two variables. To state a 
relationship in more exact mathematical language involves the application 
of some form of curve fitting by which the constants of the equation can be 
determined. 

The student who is interested in a complete discussion and treatment of 
graphic methods is referred to books on the subject by Brinton and by 
Arkin and Colton.* 

* Brinton, W. C., Graphic presentation, New York: Brinton Associates, 1939; Arkin, 


Herbert, and Colton, R. R., Graphs, how to make and use them, New York: Harper, 
1936. 


Chapter 3 


DESCRIBING FREQUENCY 
DISTRIBUTIONS 


It has been implied in Chapter 2 that a variable, such as height, IQ, or 
reading ability, can be represented by X, where X takes on various values, 
i.e., varies from individual to individual. Obviously, X is not used here to 
represent an unknown but rather as a symbol for any of several known 
quantities. When a frequency polygon is drawn and smoothed, it is often 
found to be a curve which has a peak or maximum near the center of the 
Xs and drops off gradually toward the base line or x axis on either side 
of the point of maximum value. In other words, a typical frequency curve 
(or polygon) or a frequency distribution can be roughly characterized asone 
which shows four chief features: a clustering of individuals toward some 
central value, dispersion about this value, symmetry or lack of symmetry, 
and flatness or steepness. Many variables or traits yield distributions 
which are said to be approximately bell-shaped, but such a description is 
not adequate for scientific purposes. We want to know about what 
particular value and with how much scatter the individual scores are dis- 
tributed, to what extent the distribution is symmetrical, and to what 
degree it is peaked or flat. That is, we need measures of central value or 
tendency, measures of scatter or dispersion or variability, and measures of 
skewness (lack of symmetry) and of kurtosis (peakedness or flatness). 
With such measures, we can describe the distribution mathematically, and 
in such a way that a statistically trained contemporary, say in Melbourne, 
can picture to himself the frequency distribution. 

Thus we are led to a consideration of the various measures of central 
value, dispersion, skewness, and kurtosis. It is adequate and usually more 
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economical of time to determine these measures from frequency distribu- 
tions rather than from the original undistributed scores. Since the compu- 
tation of the descriptive terms frequently involves a determination of the 
lower limit or midpoint of a class interval, the student should recall what 
has been said about actual and expressed class limits. Obviously, u we 
need the midpoint of an interval, it is necessary only to add one-half the 
size of the interval to the actual lower limit, which must be determined by a 
consideration of the nature of the scores or measures which constitute the 
variable. Psychological measurements and test scores are usually treated as 
though rounded to the nearest value. 


MEASURES OF CENTRAL VALUE 


The mode. A glance at a typical frequency distribution will indicate to 
us the most frequently occurring X value, or for grouped data the group of 
X values which has the greatest frequency. This maximal frequency 
roughly defines the mode. For nongrouped data the mode is the X value 
having the greatest frequency, whereas for grouped data the mode is taken 
as the midpoint of the interval which has the greatest frequency. Fora 
smoothed frequency curve, the mode is the X value at which the curve 
reaches its maximum height. The mode is one indicator of central value, 
but as a descriptive statistic it has serious limitations. If a different size 
interval is used, the mode may be decidedly different. Furthermore, it 
occasionally happens that two nonadjacent intervals have the same maxi- 
mal frequency, thereby yielding two modal values. Such a distribution is 
said to be bimodal, but it should be noted that the bimodality may not be 
real but merely accidental, the resultant of the particular grouping interval 
chosen. In dealing with certain discrete series, like size of family, the 
modal value is apt to be more typical than some other measure of central 
value and therefore should be used, even though as a measure it is subject 
to greater sampling fluctuations than either the mean or the median. 
(The question of sampling cannot be discussed at this time; the student is 
asked to take on faith statements regarding the efficiency of a given 
statistic.) 

The median. As a measure of central value, the median is defined in two 
ways: (1) if the individual scores are arranged in order with respect to 
some trait, the median is the value of the midmost individual if N is odd, or 
lies midway between the two middle individuals when N is even; (2) when 
a distribution has been made, the median is defined as the point on the 
scale such that the frequency above or below the point is 50 per cent of the 


total frequency. For grouped data, the median may be determined by the 
following steps: 
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1. Find one-half of N. 

2. Count the frequencies in a cumulative manner from the bottom up to 
that interval, say the sth, the frequency of which if included would give 
more than, if not included less than, N/2 cases. Obviously the median 
will fall somewhere in this interval unless exactly half the values fall below 
the lower limit of an interval, in which case this lower limit is the median. 
Let F, equal the total frequency up to the sth interval, and let F, equal the 
frequency in the sth interval. 


Table 3.1. The calculation of the median 


Score n 
310-319 1 
300-309 2 
290-299 4 N[2 = 25 
280-289 1 sth interval is 260-269 
270-279 6 F, = 24 Е, = 12 
260-269 12 i = 10 
250-259 11 LLs = 259.5 
240-249 8 25 — 24 

= =2 
230-239 2 Mdn = 259.5 + 10 12 260.33 
220-229 0 
210-219 3 
50 


3. (N/2 — F,)/F, will be the proportional distance required in the sth 
interval to locate the median. 

4. Letting i equal the size of the interval and LLs the lower limit of the 
sth interval, the median will be given by 


Mdn = LLs + 025 G.1) 
This involves the defensible assumption that the scores for the cases falling 
in the sth interval are distributed fairly evenly over the possible score values 
in the interval. 

The calculation of the median is illustrated in Table 3.1, in which is given 
the distribution of scores made by 50 college men on the Brown spool 
packer. The score is the number of spools packed in four 1-minute trials, 

The chief merits of the median are its ease of computation, its indepen- 
computed even ifa known number of extremes 
he fact that it is not affected by the size of 
learer after a discussion of the mean. 


dence of extremes (it can be 
have not been measured), and t 
extremes. This last point will be c 
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The mean. This arithmetic average will already be familiar to most 


readers. The mean is defined simply as the sum of all the scores or measures 
divided by their number or 


M 


x 
N 


M= 


(3.2) 


where X represents any score, the symbol X means “the sum of,” and N is 
the total number of cases. When N is small, this definition form can be 
used to compute the mean, but when N is large, say 50, 100, or more, such 
a method is not economical of time. Ordinarily, when N is large, we make 
a frequency distribution from which it is possible to compute the mean 
and median and other statistical measures. Assuming that the midpoint 
of an interval is typical of all the individuals in the interval, we can obtain 
the mean by summing the products of the several midpoints times their 
respective frequencies and dividing this sum by N. The error introduced 
by the use of midpoints is nonsystematic, i.e., tends to be ironed out so far 
as the computed mean is concerned. 

The computation of the mean can be shortened 
arbitrary origin and deviations therefrom. 
procedure can be readily grasped by conside 
ing the mean height of a group of men. 
height from the floor or as so much in exces 
the floor. The sum of the excesses divided 
and obviously we must add 5 feet to this to 
group. 

When we have a fre 
still further by express 


further by use of an 
The reasonableness of such a 
ring the problem of determin- 
We could measure each man's 
5 of a stationary bar 5 feet from 
by N will be the mean excess, 
obtain the mean height of the 


quency distribution the arithmetic can be shortened 
ing the deviation from an arbitrary origin in terms of 


definition formula for the mean we have 


M LEX _ X(40 Tid) X(AO)-Xid 
N N N 
Now (АО) will equal М(АО) because summing a constant N times is the 


same as multiplying it by N. As ап exercise, thestudent should demonstrate, 
by taking varying numbers each multiplied by a constant, that Nid = iXd; 
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a constant can be brought out from under the summation sign. Hence 
we have 

i, X 

yr = NAO LES oae {24 

М N 

Since we started by summing N Xs and since each X is associated with 

a d value, we should be summing N ds. That is, the d value for a particular 

interval needs to be summed f times ( f being the frequency for the interval), 

but the sum for a particular interval is simply f times its d. If we replace 


Table 3.2. Calculation of the mean 


Score f d fd 
310-319 1 10 10 
300-309 2 9 18 
290-299 4 8 32 
280-289 1 7 7 Ifd = 235 
270-279 6 6 36 
260-269 12 5 60 Md. 
250-259 T 4 4 айы 
240-249 8 3 24 
230-239 2 2 4 M = 214.5 + 47.00 = 261.50 
220-229 0 1 0 
210-219 3 0 (0 

50 235 


Ed by Xfd we explicitly indicate that each d is to be summed as often as it 

Occurs. Accordingly, our computational formula for the mean is written 
as 

X fd 
adori (3.3) 
N 

In our algebraic derivation of formula (3.3) the only restriction placed 

on AO was that it be the midpoint of an interval; hence we are free to 

choose arbitrarily the midpoint of any interval as 40. In order to avoid 

negative ds, АО is ordinarily taken as the midpoint of the lowest interval. 


Table 3.2 indicates the computation of the mean from grouped data by use 
of an arbitrary origin and deviations therefrom in terms of step intervals. 

If we had taken AO near the center of the distribution we would be 
following the so-called guessed average method, a method which has the 


advantage of smaller d values but has the disadvantage of both negative 


ànd positive ds. 


Parenthetically, it might be pointed out that the use of the arbitrary 
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origin, step-interval scheme is analogous to using coded scores. If we 
regard d as a coded value, we see from X = AO + саи = (X — AO)/i, 
or that in general we have a coded score Y, — (X — K)/k, with K and k so 
chosen as to give coded values ranging from zero to between 10 and 20. 
Then the mean of the original scores is given by M = K + k times the 
mean of the coded scores. 2 

The beginning student who is puzzled about which measure to use, the 
median or the mean, should remember that the purpose of measures of 
central value is description. When we attempt to reduce a mass of scores 
or a distribution of measures to a few descriptive constants, the mean and 
median are both descriptive terms which more or less adequately depict 
the “average” or typical score, and the choice between the two is frequently 
determined on the basis of which is more typical. Thus, if six men run 
100 yards in 9.6, 9.7, 9.8, 9.9, 10.0, and 14.0 seconds, the mean value of 
10.5 is not as typical as the median value of 9.85. In general, the mean is 
not as typical as the median when there are extreme measures in one direc- 
tion. However, when the scores are distributed in an approximately 
symmetrical fashion, the mean and median will be equal or nearly so, and 
either will be as typical as the other. The mean in this case has two distinct 
advantages over the median. (1) It is usually a more stable measure in the 
sampling sense, i.e., if we regard our scores as based on a sample of N 
individuals and then take another sample, the means of the two samples 
will in general show closer agreement than the two medians. This point 
will be discussed in more detail in the chapter on sampling errors. (2) Itcan 
be handled arithmetically and algebraically. The student shou 
that, if the mean of N, cases is Му, and of №, cases is М», 
two groups combined will be given by 


ld prove 
the mean of the 


м = №№ + NM, 
x Ni + № 


The median cannot be handled in such a fashion. Furthermore, the mean 


is used in connection with more advanced topics in statistics, whereas the 


median is seldom mentioned. Thus, unless the distribution is markedly 


skewed, the mean should be used. The problem of describing skewness 
will receive consideration after measures of variation have been discussed. 

As exercises, the student should show algebraically or to his own 
satisfaction by numerical examples that: (1) if a constant is added to or 
subtracted from the scores of a group, the new mean will be M + Cor 
M — C, where C is the given constant and M the mean of the original 


Scores; (2) if all the scores аге multiplied by a constant, C, the new mean 
will be CM, whereas dividing by a constant will lead to M/C as the new 
mean. 
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MEASURES OF VARIATION 


The description of the extent of scatter (or cluster) about the central 
value may be obtained by any one of several measures. These measures 
differ somewhat in interpretation and usefulness. One may doubt whether 
the range (highest to lowest score) is of sufficient value in psychological 
research to justify its use as a measure of variation. It is, obviously, 
determined by the location of just two individual measures or scores, and 
consequently tells us nothing about the general clustering of the scores 
about a central value. 

Quartile deviation. An easily computed description of dispersion is the 
quartile deviation (Q), defined as (Оз — Q,)/2, in which Q; (or the third 
quartile) is the point above which one-fourth of the cases fall and Q, (or the 
first quartile) is the point with three-fourths of the cases above. О» (or the 
median) has already been defined as the point above which one-half of the 
cases fall. The computation of the two quartiles Оз and Q, from grouped 
data is essentially the same as that of the median. For instance, in deter- 
mining the third quartile we count up to the interval in which the point 
falls which divides the number of cases into two parts: three-fourths below 
and one-fourth above. The distance into this interval is found in exactly 
the same manner as in computing the median. Since the quartiles are not 
influenced by extremes, it is customary to use them along with the median. 
By definition, 50 per cent of the cases fall between the first and third 
quartiles, but in nonsymmetrical distributions it is not likely that the limits 
indicated by the median plus and minus Q will include 50 per cent. It 
would seem better to report both the first and third quartiles, instead of Q, 
since these values along with the median make it possible to picture 
whether or not the clustering above the median is different from that below 
the median. 

Percentiles. Closely allied to the quartiles are the percentiles. The Pth 
percentile is defined as a point below which P per cent of the cases fall. 
Thus the median is the 50th, the third quartile the 75th, and the first 
quartile the 25th percentile. The 10th, 20th, - -+ 90th percentiles are 
sometimes called deciles. The computation of the percentiles from grouped 
data is accomplished in the manner indicated for computing the quartiles. 
The location of the zeroth and 100th percentiles is always perplexing. Since 
these two points are dependent upon the location of just two scores 
(ie, are greatly influenced by chance), they are difficult to interpret. 
Common sense would suggest that the concept of these two percentiles be 
dropped. 

Percentiles may readily be associated with the cumulative frequency 
distribution, and with the ogive curve if cumulative percentage frequencies 
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(obtained by dividing the fs by N) are used along the ordinate when plotting 
the ogive. In fact, the ogive may be used as a graphic Scheme for deter- 
mining score values corresponding to given percentiles. For instance, if 
we wish to obtain the 25th percentile point, we find 25 on the ordinate 
scale, proceed horizontally to the ogive curve, then vertically to the x axis, 
and read off the score corresponding to the 25th percentile. Scrutiny 
of Fig. 2.4 will help the student understand the process. Could we also use 
the ogive as a basis for determining the percentile value of a given score? 

The use of the difference between percentiles as an indication of disper- 
sion should be obvious. In fact, the 10th-90th percentile range is a some- 
what better (more stable from sample to sample) measure of dispersion 
than the quartile deviation. Percentiles, however, are chiefly of value in 
reporting the scores of individuals on psychological and educational tests. 
Ordinarily a raw score gives no inkling of what it means, 
said that an individual scores at or near the 85th percenti 
is that 15 per cent of his fellows score higher or better 
percentile score carries with it some idea of the location 
with reference to the Broup. Furthermore, percentile s 
different tests are comparable if derived from the same 
The original raw scores might be different units, e.g., nu 
per minute and time to read a 
comparable. 


whereas when it is 
le, the implication 
than he. Thus a 
of the individual 
cores for entirely 
group or sample. 
mber of additions 
page of prose, and consequently not at all 


lled the mean deviation or mean 
efined as the average of the devia- 
ean, Thus, if z = Y — M, then 
ute value of 2, i.e., the negative 


is seldom used; the student, however, needs to know something about it if 
he reads the earlier research literature in psychology. 

Contrasted with the quartile devi 
weight to extremes, and for the usual 
plus and minus 4D will include ab 
average deviation is larger than Q but not so large as the standard deviation 
to which we now turn. 

The standard deviation, 


he A third measure of variation, the standard 
deviation, S, is defined as 


5 = VEJN (3.4) 
where z = XY — M. To com 
formula would be ver 


involve decimals. 


pute the standard deviation direc 


tly from this 
y cumbersome and un 
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included here in order further to familiarize the student with the method of 
handling summation signs. The derivation will be carried through for S?, 
technically known as the variance; then at the end we can take the square 
root to obtain S. 

From formula (3.4) we have 


in which x = X — M. 
As in deriving formula (3.3), we can set 


X = AO + id 
and since M = AO + i(Zd/N), we have, substituting in x = X — M, 


x = AO + id — (ло + 20) 
N 
= id — іс 
where for convenience we let c stand for Ed/N. F 
a? = (id — іс)? = i*d — с)? f © 
Xa? = i?X(d — с)? | 
= (54° — 2cLd + Ne?) | 
Dividing both sides by N, we have, \ 
Zr’ ха Уа с) 
2207 = P| 20 + М—|; 
а М : ( N 5 N * N/ ^ 
2 v 2 
=F 26 + 
М N N 
2 
1 2» _ 2 
E [NX — (54)?] 
hence 
S = ХАА — (2d 
N 
But since this form does not make explicit the fact that each d, and each 


а, must be summed as often as it occurs, we will insert / for the frequency 
of occurrence. Thus our ias. dint formula becomes 


S=- xv NX Ја — (X Ја) (3.5) 


Where Xfd = the algebraic sum of deviations (in step intervals) from an 
ы" origin, and Efd? = the sum of the squares of the deviations (in 


2, VV 8.C.E.R.T., West Beng ` 
er me 
A eN Date.. Ps E Дт wa 
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step units). The arbitrary origin may be taken as the midpoint of the ws 
interval or as a guessed average near the center of the distribution. ' : 
advantage of the latter procedure is that the ds will be relatively smal E 

consequently will not lead to the handling of large numbers, e Mts 
first procedure avoids the use of negative numbers and is more readily 

chine computation. 

x deus of S fon grouped scores is illustrated in Table 3.3, which 
is identical to Table 3.2 except that we now have an fd? column. It is 


Table 3.3. Computation of S by use of an arbitrary origin 
Score f d fd Ја? 
310-319 


1 10 10 100 

300-309 2 9 18 162 
290-299 4 8 32 256 
280-289 1 T 7 49 By formula (3.5): 
270-279 6 6 36 216 н 
260-269 12 5 60 300 — ыны 
250-259 11 4 а 176 S = 50 У 2001339) — (235) 
240-249 8 3 24 72 
230-239 2 2 4 8 = 21.66 
220-229 0 1 0 0 
210-219 3 0 0 9 

50 235 1339 


easily seen that the fd? values can be obtained by multiplying the fd values 
by the corresponding ds. If we regard d as a coded score (= X,) with i as 
the constant k, we see that (3.5) is appropriate for computing S by way of 
coded scores. 

The fd and fd? columns need not appear on the work sheet when we are 
computing the mean and standard deviation by a Monroe or Marchant or 
Friden type calculating machine. The two required sums can be obtained 
by punching in the lowest d in the right-hand part of the keyboard and 
the corresponding d? just left of the center of the keyboard, multiplying 
both simultaneously by the given frequency, and then, without clearing 
the lower dial, punching in the next larger d and its square, and so on. The 
Successive products so obtained will be accumulated by the machine so that 
Yfd is read directly from the right-hand side of the lower dial, and ууа? 
is read from near the center of the same dial. If either an 8- or 10-bank 
machine is used, the ds of 9 and less are punched in the right-hand column 
of the keyboard, and higher values will of course require the first two 
columns. The squares of the ds will ordinarily be less than 400, rarely 
greater than 961, so that their values can be punched in columns 6, 7, and 8. 
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The student should note that the squares of 1, 2, and 3 are to be punched in 
column 6, the squares of 4 to 9 in columns 6 and 7, and the squares of 10 
to 31 in columns 6, 7, and 8. The sum of the squares will appear in the 
lower dial from window 6 to the left. With a little practice the two 
required sums for a distribution of 15 intervals and 200 cases can be 
Obtained in less than a minute. .It should not be necessary to say that the 
computation should be done twice as a check. 

For use with a calculator, formula (3.5) has an advantage over formulas 
which involve two divisions under the radical. Thus we place the sum of 
the squares in the right-hand side of the keyboard, multiply by N, and 
leaving the product in the lower dial, punch the sum of the ds in the key- 
board and subtract it Xfd times, and then from the dial copy the value of 
NXfd? — (Yfay. 

Briefly summarizing, it will be noted that (1) with a machine, Xfd and 
fd? taken from an arbitrary origin at the bottom of the distribution are no 
more difficult to compute than when taken from a guessed average, (2) all 
sums are positive, and (3) the two sums necessary for determining both the 
mean and standard deviation can be obtained in the same operation. It is 
helpful to write the d column in red on the work sheet, thereby throwing it 
into contrast with the f column. 

When N is small and the scores are not too large, 5 can be computed 
economically by way of the original (raw) scores. The definition formula, 
(3.4), calls for Xa?. Note that since each x = X — M, we have 


Xa? = У(Х — My = EX? — 2MXX + ZM? 
Replacing the last E by N (we are summing M? N times) and replacing M 
by XYIN, we have 
XX, 
Уа? -Xx?-— 22 xx t x 
i N 
Уз? = x LNEX® - xy] (3.6) 


Substituting in formula (3.4) leads to an N? in the denominator, which can 
be brought out as 1/N. Hence we have 


= — /NEX? — (XF (3.7) 


All the scores are simply squared and then summed to get 2X, and DX 
has the same meaning as in formula (3.2). 
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Although a mean computed by formula (3.3) from grouped data will 
not err systematically from the value obtained by formula (3.2), the use of 
formula (3.5) for calculating S tends to give a value which is too large when 
compared with the nonapproximate value yielded either by (3.4) or by 
(3.7). The reason for this is easily explained at the blackboard—we give 
here a hint. In general for an interval below the mean there will be more 
scores above than below the midpoint of the interval, whereas for an 
interval above the mean there will be more scores below than above the 
midpoint. Thus in taking the several midpoints as representing the scores 
within the several intervals, we are in effect using values which deviate too 
far from the mean. 


We may correct for the systematic error involved in using formula (3.5) 


by substituting in 
Seor = V S? — (i?/12) (3.8) 
The i?/12 is known as Sheppard's correction for grouping. Theuncorrected 
and corrected values differ but little when 12 or 15 intervals have been used, 
and as the number of intervals is increased, the difference becomes smaller 
and smaller. If less than 10 intervals have been used, the error may be 
appreciable and the correction should be applied. These considerations 
form the basis for the Suggested rule that at least 10 or 12, and not more 
than 20, intervals be used. 
Regarding the interpretation of the standard deviation, it can be said 
that, when we have the usual symmetrical bell-shaped distribution, about 
68 per cent of the cases will fall between the limits plus and minus 15 from 
the mean, about 95 per cent between plus and minus 25, and nearly all the 
cases (99.73 per cent) between plus and minus 3S. The standard deviation, 
even more than the average deviation, gives weight to extremes and there- 
fore may not be as good as the quartiles for describing the dispersion. The 
Standard deviation has decided advantages over other measures of dis- 
persion. (1) Typically, it is more stable from the sampling point of view. 
(2) It can be handled algebraically, i.e., if we have two groups of №, and №, 
cases, with M, and М, and S, and S», as the respective means and standard 
deviations, we can obtain the standard deviation for two groups combined 


by 


5, = J N(M’, + 5ч) + N(MS SS) M?, (3.9) 
Ni + № 


where the subscript c refers to the combined group. The mean for the 
combined group can be obtained by a formula given on p. 18. Formula 
(3.9) can be extended for determining the standard deviation for three or 
more groups combined. (3) The standard deviation is a mathematical term 
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which has considerable importance in more advanced statistical work. It is 
usually involved in the determination of sampling errors and is the measure 
of variation used in the analysis of variation and in connection with 
correlational analysis. Therefore, unless there are definite reasons for not 
using it, the standard deviation, instead of the average deviation or Q, 
should be used as a description of the amount of dispersion. 

As an exercise, show that, if a constant is added to or subtracted from 
each of a set of scores, the standard deviation does not change, and that 
multiplying or dividing each by a positive constant will lead to CS or S/C, 
respectively, as the new standard deviation, where SS holds for the original 
Scores and C is the constant. 


MEASURES OF SKEWNESS AND KURTOSIS 


If a distribution is not of the symmetrical bell-shaped type, it is not 
Sufficient for descriptive purposes to report only the mean and standard 
deviation. We also need a measure of the lack of symmetry, i.e., of 
skewness, and frequently it is desirable to describe the distribution still 
further by giving a measure which indicates whether the distribution is 
relatively peaked or flat-topped, i.e., a measure of kurtosis. 

Skewness can be described roughly by a number of measures, such as the 
difference between the mean and median divided by the standard deviation, 
Or in terms of quartiles or percentiles. If an adequate and stable description 
Of skewness is desired and if a measure of kurtosis is also needed, a method 
based on moments is to be preferred. 

The first four moments about the mean are defined as follows: 


(3.10) 


щщ = = 


Where z represents the deviation of each score from the mean of all the 
Scores. For purposes of computation, we can determine the moments 
about an arbitrary origin, and then from these values we can obtain the 
moments about the mean. This procedure has already been employed in 
computing the standard deviation; i.e., we took deviations from an arbi- 
trary origin. (The definition of the standard deviation was in terms of 
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deviations from the mean.) If we use v to represent moments about an 
arbitrary origin, the first four moments about AO can be defined as 
follows, where d is the score deviation from AO in step units: 


5; ) 

oy = Dd 
N 

= 2 
„== 

; (3.11) 
x fd? 
d. == 

N 

4 

ü LEfd 
N 


When the vs have been calculated, the us can be readily determined from 
the following relationships: 


Uz = (0 — vt) = S? Q.12) 
us = (v — Зо + 203) | 


Uy = (vy — Agr, + Geo? — 308) 


ш = 0 | 


The student should note the similarity of the formula in (3.12) for the 


second moment to that given for the standard deviation [formula (3.5)] 
A measure of skewness defined in terms of moments is 


Hus. (3.13) 
Uaua 
For symmetrical distributions the value of & will be zero; hence the 
departure of g, from zero can be taken as a measure of Skewness. The 
deviation of g, from zero, however, must be considered in light of the 
operation of chance or in terms of sampling errors (to be discussed later). 


The skewness is said to be positive when 8; İs positive and negative when 21 
is negative. 


The degree of kurtosis can be described by 


81 = Уё = 


B= cdm e (3.14) 


2 


When gy is less than zero, the distribution tends to be flat-topped (platy- 
kurtic) whereas for g, greater than zero it is relatively peaked with some- 
what higher tails (leptokurtic). When both £ 
the distribution is of the usual symmetrical 
referred to as the “normal” 


and g, are zero or near zero, 


bell-shaped type, which is 
frequency distribution. 
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Formulas (3.13) and (3.14) also define f, and fo. which have been and are 
still used as measures of skewness and kurtosis. Recently, the g measures 
have come into use because of certain advantages that need not be dis- 
cussed here. 

It will be noted that the measure of skewness involves taking the third 
moment relative to 5? (since м, = 5°), and that the measure of kurtosis 
depends on the fourth moment relative to S*. For a given distribution, all 
the values of up, из, and и, are in terms of the same measurement unit, say 
inches or pounds or IQs or minutes; hence the ratios in formulas (3.13) 
and (3.14) are pure numbers, i.e., are not inches or pounds or IQs or 
minutes. If we have the distribution of the weights and of the heights 
for 1000 individuals, the measure of skewness for the height distribution 
may be compared directly with that for the weight distribution. This is 
true by virtue of the fact that for each we are expressing the third moment 
relative to the amount of variability, both in inches for one distribution, 
both in pounds for the other. Likewise, it can be reasoned that the 
measures of kurtosis for different distributions are comparable, although 
the distributions involve different measurement units. 

In order to help the reader visualize the meaning of different values for 
£1 as associated with different degrees of asymmetry, Fig. 3.1 has been 
prepared. 


gi- 44 


gi 7.13 


Fig. 3.1. Polygons with different degrees of skewness. 
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When we have determined the mean and the second, third, and fourth 
moments, and from the moments have derived expressions which tell us the 
degree of dispersion, skewness, and kurtosis, we havea description adequate 
for most distributions. These measures can be used to determine the type 
of mathematical equation which will fit an observed frequency polygon; 
i.e., we can write the equation of a frequency curve which fits the observed 
frequency distribution. A distribution frequently found in psychological 
research is of the “normal” type, which is sufficiently described by the 
mean and standard deviation. Ordinarily it is not necessary to compute gi 
unless the distribution “appears” to be skewed or to compute g, unless the 
distribution seems peaked or flat. The nature of the research, the type of 
variable being studied, and also the size of the sample are factors which 
need to be considered in making a decision as to the necessity for com- 


puting measures of skewness and kurtosis. It is seldom advisable to 
compute these measures when N is less than 100. 


The student should be apprised of the fact th 
occurrence of symmetrical distributions for 


result from an artifact, and also that the occurrence of a skewed distribu- 


tion may likewise be artifactual. This is true because very few of the instru- 
ments used in psychological *measurement" involve equal unit scales—the 
measuring units are frequently arbitrary or 


а even accidental. Many of the 
variables are measured simply in terms of the number of items checked or 
the number of items correct. 


The shape of the resulting distributions is 
largely determined by the percentage checking the items or by the difficulty 
of the items. If the items are of medium difficulty for a Broup, it can be 


expected that the scale will yield a symmetrical distribution when applied 
to the group; if the items are easy, the scores will pile up toward the top 
(give negative skewness); if difficult, a piling up toward the bottom will 


at the rather frequent 
Psychological variables may 


Chapter 4 
DISTRIBUTION CURVES 


By successive smoothing of a polygon (or distribution), we can iron out 
irregularities until the polygon becomes a smooth" or regular and uniform 
curve. We can think of this curve as being similar or nearly identical to 
what we would obtain were we to increase indefinitely the size of our sample 
and at the same time use smaller and smaller grouping intervals. That is, the 
limit of a polygon, as we allow N to approach infinity and the interval size 
to approach zero, is conceived to be a curve which is smooth and regular. 
Now such a uniform curve can usually be described in terms of a mathe- 
matical equation. The student may recall that the general equation for a 
straight line is y = ax + b, and that y = 2r + 3 is the equation for a 
particular line, that a? + y? = a? is the equation for a circle of radius а 
with the origin or intersection of the abscissa and ordinate at the center, 
also that y = a + bx + cz? is the general equation for a parabola. lt is 
not until we give specific numerical values to the constants that we have 
equations for particular curves. 

Frequency curves can be thought of as representing the relationship 
between two variables: y, or the height of the curve, and x, the variate or 
variable under consideration. Frequency polygons or distributions, even 
when smoothed, may be of various shapes: symmetrical or skewed, 
flat-topped or steep, humped near the center or at one end, bimodal or 
unimodal, J-shaped or U-shaped, falling off gradually or suddenly, etc. 
A complete description of a frequency distribution is obtained when we 
have succeeded in writing the equation of the curve which "'fits" the distri- 
bution. The type of curve to be fitted is chosen on the basis of certain 
criteria derived from the moments and the interrelations among the 
moments. The late Professor Karl Pearson developed the mathematics of a 
System of frequency curves and classified distributions according to several 
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"types" of curves, but a complete exposition of these types is beyond the 
is text. 

RE A bell-shaped curve which is often approximated closely 
by frequency distributions and which is intimately involved in much of 
statistical inference is known as the normal curre. We need to know in 
detail the properties of this curve. | | 

At this point we need to digress briefly to discuss a problem of notation. 
The mean and standard deviation have been defined in termsof an Observed 
batch of scores for N persons, presumably selected or drawn as a sample 
from some defined population of persons. The symbols, M and S, stand 
for the sample mean and standard deviation. It is convenient to have 
symbols for the corresponding population values (parameters). Let us 
let и (mu) stand for the population mean and c (sigma) symbolize the 
population standard deviation. Rarely will we have numerical values for 
и and c; M and 5 may be regarded as estimators of u and c. 

The general equation for the normal distribution may be written as 


y= B. e^ Х-и)? (4.1) 
су 27 
for a population of N scores ог observations, or as 
N 


e^ X- mFS? 


Sf (4.2) 


for a sample of N scores having g, and g, values so near zero that one may 
regard the distribution as normal in form (within the limits of chance, or 


sampling, error—yet to be discussed). Equations (4.1) and (4.2) involve 
7 (3.1416) and e (2.7183). In each 


Y =з 


deviation units, i.e., wit 
The 


mode also coincide. For values of 


2 other than zero, the height of the 
curve will be less than that at the m 


ean. This is evident if it is noted that 
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the exponent of e is negative. As we go farther in either direction from the 
mean, the height of the curve becomes less and less (see Fig. 4.1). The 
dropping off is slow at first, then rapid, and then slow again. If we take the 
maximum value of y as unity, the ordinate at the point .5e above the mean 
is about .883; at lo, about .606; at 20, .135; and at 3c, .011. As we go 
still farther from the mean, the value of y becomes smaller and smaller, and 
as x approaches infinity, y approaches zero (asymptotic). Theoretically, 
the curve never reaches the base line. 

For both the frequency polygon and the histogram, the frequency for a 
given interval is represented along the y axis or ordinate, but for smoothed 
curves and for mathematical curves such as that defined by equation (4.1), 
it is advantageous to regard the area under the curve for a particular 
grouping interval on the = axis as indicating the frequency for that interval. 
Accordingly, the total area under the curve corresponds to the total 
frequency, or N, and the area under any given part of the curve, i.e., the 
area between any two X values, can be expressed as a percentage of the 
total. For example, the area included between the mean and the point on 
the base line 1с above the mean is 34.13 per cent of the total, and the 
area between plus and minus Io is 68.26 per cent. The latter percentage has 
already been given on p. 24 as one way of interpreting the standard 
deviation. The limits plus and minus 2c will include 95.45 per cent; plus 
and minus 3c, 99.73 per cent; and plus and minus 4c, 99.9936 per cent. 

The foregoing percentages hold for the theoretical curve, and will tend 
to be approximated in the distribution of a sample that tends to follow 
the normal distribution. Strictly speaking, no distribution of scores or 
Observations in psychology can ever follow the normal curve insofar as the 
extremities of the distribution are concerned. 


-30 -20 -o 0 +0 +20 +30 
Fig. 4.1. Normal curve. 
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When we transform a set of scores into relative deviates, or so-called 
standard scores (z scores), by 


эе =й (4.3) 
or by 


E (4.4) 
5 


we have each score expressed as a deviation from the mean in terms of 
fractions and/or multiples of the standard deviation of the distribution of 
the scores expressed as Xs. Such a score transformati 
based on a sample, hence is accomplished by using (4.4); 
values are used because the parameters called for in (4.3) 

The standard scores obtained by (4.4), or by( 
have a mean of zero and a standard deviation o 


Shown. Since the mean of any set of scores is th 
number, we have 


on is ordinarily 
that is, sample 
are unknowns. 
4.3) when possible, will 
f unity, as can be easily 
eir sum divided by their 


Now Zz = У(Х — М) = УХ УМ = Sy = NM,butfrom М = EX|[N 
we have NM = SY, hence Xx = рр р оу always (irrespective of 
distribution shape). Therefore, М. = 0, always. 

Since M, = 0, each = (or x/S) is a deviation from the mean of all the z 


values. If these deviations are squared, summed, and divided by N we have 
their variance, the Square root of which gives their standard deviation. 
Thus, 


The variance of Standard scores is 1, 


The change to standard Scores is a /inea 
lation of (4.4) leads to the 


hence the standard deviation is 1. 


r transformation because a manipu- 
(more recog 


nizable) equation for a straight line. 
That is 
z= X —M mE X M 
S 5 5 


g the relation between z and X; 1/S is 
the slope and — MIS the intercept. Such a linear transformation will not, 
frequency pol 


scores. This transformation is equi 
the x axis to the point correspondin 
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SO as to make the standard deviation equal to unity. The student who is 
skeptical about this transformation business should be reminded that 
change of scale is commonplace. We change inches to feet, feet to miles; 
we change from the Fahrenheit to the centigrade temperature scale; etc. 
The N in the equation for the normal curve may be regarded as the total 
area under the curve. (It will be recalled that the area under a histogram 
or under a frequency polygon may be regarded as N.) It will be of con- 
siderable convenience to regard the total area under a normal curve as 
unity. With this and the concept of standard scores in mind, we may 


rewrite (4.1) or (4.2) as 
mda (4.5) 


as the equation for the unit normal curve (unit area, unit standard 
deviation; and mean of 0). Note that this is a general equation in which z 
as a relative deviate may be either a standard score or, as we shall see, 
the deviation of a value (not necessarily a score) taken relative to an 
appropriate standard deviation. 

The value of 1/V/27 is about .39894, and therefore at = 0 (i.e., at the 
mean) y will equal .39894, which is the maximum y for the normal curve of 
unit area and unit standard deviation. The ordinates for other values of z 
will be less. For instance, at +12, y = .24197, and at +22, y = .05399. 

The percentage area under any part of the curve can be determined by 
methods of the calculus. The area under the curve between any two values, 
2; and 2g, is obtained as the value of the integral 


A =f”, A (4.6) 


Perhaps this expression will be more meaningful to the student who has 
not studied integral calculus if the given area is regarded as composed of a 
large number of strips, each having a tiny base dz and a height of y. For 
each such strip the area will be nearly y dz, and the integral sign in formula 
(4.6) simply means the “sum of" the areas of these tiny strips. 

The student of the calculus will also note that the first derivative of 
either equation (4.1) or (4.5) set equal to zero and solved will yield a 
maximum for the curve when = ог z equals zero, thus proving more 
rigorously that the mean and mode coincide. If the second derivative is set 
equal to zero and solved for x or z, it will be found that the points of 
inflection of the curve are located where x is +o or zis +1. 

Normal curve table. Because of the widespread use of the normal 
Curve, tables of proportionate frequencies and ordinates for various z 
values are available. The student need not be able to integrate equation 
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xlo orz 
Fig. 4.2 


(4.6) in order to understand a table of the normal curve functions. Table A 
of the Appẹndix contains four columns, the first of which is z values. The 
second column gives the area of the curve from the mean out to the 
corresponding z value, this area being the same whether z 
negative; a given z divides the curve into two parts, and the 
gives the area of the smaller part. 

obtained by adding .5 to the entries i 
the proportionate area between plus 


is positive or 
third column 
The area of the larger part can be 
n column 2. If we wish to determine 


and minus a given z, we should double 
the values in column 2. The fourth column gives the y or ordinate for each 


of the z values. For purposes of reference, the meanings of the several 
entries in Table A are illustrated in Fig. 4.2, in which an ordinate (dotted) 
has been erected at a z value of +.8, The area from the mean to +.8 is 
found from column 2 as 28814; the area below this point is .78814, and 
that above is .21186, of the total area. Note that .78814 plus .21186 equals 
unity and that .78814 is .50000 plus 28814. The height of the curve at 
2 = .8 is found from column 4 as -2897, whereas the maximum height of 
:3989 is at the mean. 

It is frequently useful to know the relationshi 
measures of dispersion for a norm 
following hold true: 


p between the various 
al distribution. It can be shown that the 


О = .84534р = .67455 
AD = 1.18290 = 7979$ 


5 = 1.48260 = 1.25334р 


It is also useful to know that for an N of 50 the S will be about one-fifth the 
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range, that for an N of 200 the S will be about one-sixth the range, and 
that for an N of 1000 the S will be about one-seventh the range. 

The tabled values for the normal curve are often used in connection with 
problems similar to the following. If a distribution of the heights of men 
is normal with a mean of 68.0 inches and a standard deviation of 2.5, what 
percentage of men are more than 6 feet tall? We find = as the difference 
between 72 and 68, divided by S, or z = 1.6; then from Table A we find 
the percentage of cases that fall above this value to be 5.48. Suppose that 
the mean IQ of 10-year-old boys is 100 and the standard deviation 16. 
What percentage have IQs between 90 and 110? What percentage of 
10-year-old boys would be classified as "gifted" (IQ above 140)? 

In practice, the answers to the foregoing questions would be approximate 
because M and S would be used in lieu of the population values, and 
because obtained distributions will not be exactly normal in form. 

The student will have noted that the answers to problems similar to the 
foregoing are possible by virtue of the fact that the areas and ordinates of 
Table A are for the standard score form of the normal curve with total 
area set equal to unity. By formula (4.4), we can pass from raw scores to 
standard scores and vice versa, and knowing N, we can readily convert 
proportionate areas to frequencies or frequencies to proportions. Thus 
the table can be used with any normal distribution regardless of the original 
measurement units. 

Standard scores. Perhaps it should be pointed out at this place that 
transforming scores, when distributions are normal or approximately so, 
to standard scores leads to new sets of scores which are comparable. For 
example, inches and pounds are not comparable units. If a man is 71 
inches in height and weighs 170 pounds, it is impossible to say whether he 
is taller than he is heavy, but when the 71 inches is transformed to a z of .9 
and the 170 pounds to a z of 1.3, we are able to say that, relative to his 
position in the two distributions, he is heavier than he is tall. Likewise, 
the raw scores on two psychological tests will seldom be comparable; 
changing to standard scores permits comparison, so that it can be decided 
whether a boy's performance on one test is better or worse than his per- 
formance on another. This assumes, of course, a close approximation to 
normality, and that the means and standard deviations used in the trans- 
formations are based on the same or highly similar groups. 

Standard scores, as defined by formula (4.4), will involve both positive 
and negative values and decimal scores. Since these are awkward to use, 
a further transformation is frequently made in such a way as to yield a 
distribution with a preassigned mean and standard deviation, instead of 
the 0 and 1 that hold for the standard scores defined by formula (4.4). If 
we wish a distribution with a mean of 50 and a standard deviation of 10, 
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we simply multiply each z by 10 and add 50. Multiplying each z by 20 
and adding 100 would yield a mean of 100 and a S of 20. Either of these 
transformations will get rid of negative values and permit a sufficient 
number of score values without the use of decimals. In general, if we wish 
to transform a set of scores having a mean, M, and a standard deviation, 
S, to new values to be called Zs, with mean equal to any value K and S 
equal to S’, all we need to do is to apply the relationship 


‘X—M)_., 
Z=2XS')+ К, or z= e )G)4 K 
which becomes 


S' M 
Ex [Жїз =" K 4.7 
se pm g STF (4.7) 


The last form is the easier to use in practice, particularly with a calcula- 
ting machine. Note that the last two terms will combine numerically and 
therefore can be placed in the lower dial as a positive or negative number; 
then the numerical value of S’/S сап be set in the keyboard as a constant 
to be multiplied in turn upon the varying values of Y. If the machine has 
à continuous upper dial, the best procedure is to multiply by the highest 
X first, and then, without clearing the dials, to subtract once for each 
successively lower value of X. Care is needed in aligning decimals, a check 


on which can be obtained by multiplying by the X nearest M. This should 
lead to a value, in the lower dial, that is n 
readily run off a table that gives the valu 

The comparability of two sets of stand 
with the same mean (K) 
tions unless the two dist 
skewness. This is unlik y 


butions from skewed to normal. 


and the resulting scores are known as T scores. 
lated as to yield a mean of 50 and ў 
constants are possible. The detaile 


Measurement,* which also include 


tion. Suffice it to say here that T 
the proportion (or percentage) of с 
those on that value, 
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on any shape distribution are comparable, provided they have been 
determined so as to yield the same mean and standard deviation. 
They differ only in the way in which they are computed, the standard 
score being a linear transformation which leaves the shape of the distribu- 
tion unchanged, whereas T scaling changes the distribution to the normal 
form. If we begin with an exactly normal distribution and convert the 
Scores to both zs and Ts, there will be a linear correspondence between 
the two sets of transformed scores. If their means and sigmas are set 
equal, the Zs and Ts will be equal to each other. 

It will be recalled that the use of percentiles is another way of expressing 
scores on different tests so as to have comparability. The student should 
give sufficient thought to percentiles and standard scores to see how they 
are interrelated when the original scores are normal in distribution. Hint: 
The tabled functions (Table A) of the normal curve may help. The student 
might also demonstrate to his own satisfaction that the difference between 
the 50th and 60th percentile points is not apt to be equal to the difference 
between the 80th and 90th percentile points. 

Kinds of distributions. In anticipation of topics to be discussed, it 
might be well to mention some possible ways of regarding frequency 
distributions. We can have an observed, or sample, distribution of scores 
for a group of N individuals; we can imagine a population distribution of 
scores for either a finite or for an infinite N; and we can conceive of a 
distribution curve defined by a mathematical equation (or function). 
Because of chance factors (as yet undefined herein) we do not expect an 
Observed sample distribution to be exactly like the distribution of the 
population from which the sample is drawn or like a defined mathematical 
distribution. 

Since we are seldom able to measure all members of a population, we 
can only assume that population scores follow some defined mathematical 
distribution. The form of mathematical curve assumed is usually decided 
upon by a consideration of the shape of an observed sample distribution. 
As will be seen later, the reasonableness of the assumption can be checked 
statistically. 

It is possible, however, to show mathematically that under prescribed 
conditions given measures will follow a defined distribution curve exactly. 
We shall refer to such a distribution as theoretical or expected. Strictly 
Speaking a mathematical distribution curve holds only for a continuous 
variable. If we had the distribution for a discrete variable, such as number 
of children per family, we would never expect that increasing N would 
produce a curve—the variable takes on only point values 0, 1, 2, etc.; 
hence we cannot allow the interval size (see p. 29) to approach zero, 
Which is necessary for a smooth curve. 
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As implied previously, there are distribution curves which are not 
normal. We shall introduce other curves (or functions) when needed. 
Thus far, the normal curve has been discussed as a frequency curve, and the 
area interpretation has been in terms of the number of individuals or 
percentage of cases falling between certain score limits. This same curve is 
often spoken of as the normal probability curve, and as such it is regarded 
as a theoretical curve. We shall see, moreover, that there are theoretical 


curves other than the normal curve which may be regarded as probability 
curves. 


Chapter 5 


PROBABILITY AND 
HYPOTHESIS TESTING 


Statistical inference and the testing of hypotheses involve the concept of 
chance, or probability. A simple example will serve to illustrate the 
probabilistic nature of hypothesis testing. Suppose a chap claims that he 
can distinguish between Camels and Lucky Strikes. To test his claim we 
could blindfold him and present him with either a Camel or a Lucky 
Strike (the brand to be presented is determined by tossing a coin). If on 
this one trial he correctly names the brand, we would not be inclined to 
accept his claim since he would have a 50-50 chance of being correct on a 
sheer guessing basis. So we give him a second trial (again, and for any 
subsequent trials, we toss a coin to determine which brand to present to 
him). If he were again successful we might give some credence to his claim 
but someone might ask whether making two correct discriminations could 
happen on the basis of chance. We shall presently see that the chances are 
1 in 4 of getting two correct, i.e., success on two trials could easily occur on 
the basis of chance. 

But suppose he is correct on three trials, then on the fourth trial, and 
also on the fifth; or perhaps he is correct on ten trials, or perhaps on 9 of 10 
trials? Regardless of the number of trials and the number of successes we 
certainly should have some information about chance success, or the 
probability of correctly naming the brands on the basis of chance guessing, 
before we reach a decision regarding the claimed ability to distinguish 
between the two brands of cigarettes. This and similar decision problems 
involve notions of probability, to which we now turn. 

Probability. If we had a box containing 70 white and 30 black balls, 
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well mixed, and were to draw 1 ball at random, the chance of the drawn 
ball's being black is said to be 30 out of 100, and the chance of its being 
white would be .70. This can be interpreted to mean that, if we made 1000 
random draws, each time replacing the drawn ball and remixing the con- 
tents of the box, the percentage of black balls drawn would be about 30, 
and of white draws about 70. If we roll a die, the probability of obtaining 
а 415%; i.e., a large number of rolls would yield a 4 about } of the time. 
If one tosses a symmetrical coin, it is usually said that there is a 50-50 
chance of its landing “heads up", or the probability of a head is 1. This 
is another way of saying that in the long run the proportion of times that 
the coin lands as a head will be the same as the proportion of times it lands 
as a tail. 

These very simple examples illustrate a definition of probability: if an 
event can happen in A ways and fail in B ways, all possible ways being 
equally likely, the probability of its occurring is A/(A + B) and of its 
failing is B/(A + B). That is, a probability figure is the ratio of the num- 
ber of favorable events to the total number of events, and it is therefore 
necessary that we be able to enumerate events in order to arrive at a prob- 
ability figure. 

If we draw a card from a pack, the probability of obtaining a spade is 1, 
and the probability of drawing a club is also 4, but the probability of 
drawing either a spade or a club is à plus 4, ог}. If we roll a die, the 
probability of obtaining either a 4 or a 5 is в plus 3, or 1. These two 
situations illustrate the addition theorem of probability: the probability 
that either one event or another event will happen is the sum of the prob- 
abilities of their occurrences as single events. (The events must be mutually 
exclusive; i.e., if one Occurs, the other cannot.) 

If we roll a pair of dice, the probability of a 2 on the first and a 5 on the 


Second is § times 1, or ae. If we toss 2 coins, the probability that the first 
will land a head and the second a head is 1 times 4, or }, 


the probability that both will land as heads. 
obtained with the second di 


first die or coin. These two 


Separate probabilities. 
As just indicated, if we toss 2 coins, the probability that the first will 
land a head and also the second a head will be } times 4, or 4, which is the 
will fall as heads. The probability that the first will 

cond a tail will also be 2 times 2, or}. But] head and 
xclusive to the above; i.e., the 
‚ and this combination or 


land a head and the se 
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event has a probability of 1, whence the probability of obtaining 1 head 
and | tail will be } plus 4, or $. This sarne result can be arrived at by 
listing all the possible combinations and taking the ratio of the number of 
favorable to the total number of possible combinations. The possible com- 
binations are HH, HT. TH. TT, from which we see that 2 out of the 4 
possible events are favorable for the occurrence of 1 head and 1 tail. We 
also note that 1 out of 4 is favorable to 2 heads. 

Suppose we were to toss 3 coins; we would have the following possible 
combinations: 


Geni H AARP T T T 
Gin? JAM H T T H H I TT 
Con3 H T H TH TH T 


The total number of possible “events” is 8, 1 of which is favorable to 3 
heads, 3 to 2 heads, 3 to 1 head, and 1 to no heads, thus giving the respec- 
tive probabilities of 1, 2. 3. and 4. If we were to toss 4 coins, we would 


have the following probabilities: 


4heads i l head i5 
3 heads үк 0 head 4; 
2 heads 385 


The student should satisfy himself that these are the correct figures by 
writing down all the combinations possible and counting those favorable 
to any particular number of heads. 


BINOMIAL DISTRIBUTION 


The process of determining possible combinations becomes quite 
laborious for, say, 10 coins, but the several probabilities can be obtained 
by the coefficients in the expansion of the binomial (a + b)”. Thus for 
n = 2 (i.e., 2 coins) we have a? + 2ab + b°, or 1,2, 1; forn = 3,a? + 3a?b 
+ Зар? + ЪЗ, or 1, 3, 3, 1; for n = 4 the coefficients are 1, 4, 6, 4, 1. In 
each case the sum of the coefficients, 2”, will be the total possible combina- 
tions, and the coefficients taken as ratios with the common denominator, 
2", will represent the probabilities for n, n — 1, n —2,::-,0 heads. 

The student may recall that the general expansion of (a + b)” is 


n(n — 1) tpi Ut — Dir — 2) 


à^ 39a... 
1x2 12283 F 


a” + па" + 


This expansion will contain (n + 1) terms and will terminate in b", For 
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n — 10, we have the following coefficients: 1, 10, 45, 120, 210, 252, 210, 
120, 45, 10, 1, which sum to 1024, or 2 to the tenth power. Thus the 
probability that all 10 coins will fall as heads is 1/1024; 9 heads, 10/1024; 
etc. If we plot these values as a frequency polygon—these coefficients are 
frequencies in the sense that they represent the expected number of times 
for 10 heads, 9 heads, etc., out of a total of 1024 tosses—we will have a 
bell-shaped graph which will resemble somewhat the normal curve. 

Another and more useful way, for our purpose, of considering the 
binomial expansion is to use p and q, in the place of a and b, with p defined 
as the probability of success on a single element and q as the probability of 
failure, or g = 1 — p. Thus we would have (p + 4)". Suppose n = 2; 
the expression would be p? + 2pg + 4°. If p = 1, as in the coin situation, 
this would give (4)? + 2(4)(4) + (4)?, or 4, 2, and } as the probabilities for 
securing 2 heads, 1 head, and 0 head Tespectively. Each term is itself 
a probability fraction; the numerators are 1, 2, and 1 as before. For 
п = 10, we would have (4)! or 1/1024, 10(3)°(3) or 10/1024, 45(4)8(4)2 or 
45/1024, etc., as the probabilities for obtaining 10 heads, 9 heads, 8 heads, 
ete. 

The chief advantage of using the p and 
see what happens when p is not equal to 1 
we roll a pair of dice with “ g of “snake eyes." 
+ 26%) + 25 as indicating the 
One-spot, and 0 one-spot. If 3 dice 
zh) + IGAR) + 18 or shy, у 


tS. „725. 
16> 216 216» 


‚ Which can be derived 
g the actual distributions available. The 
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formulas are: 


и=пр 
а= Jnpq 
gum AR (skewness) 
Jnpq 
Уг = i= by (kurtosis) 
npq 


Since these formulas, which are for theoretical distributions, specify 
parameters (not values based on a sample), Greek instead of Latin letters 
are used as symbols. 

It should be noted that л is the number of elements, not the number of 
cases. The formula for skewness permits several deductions. When p = 4, 
q also equals 4, and hence the skewness is zero; the degree of skewness for a 
fixed л depends upon the deviation of p from 1, i.e., the smaller or the larger 
the probability of success for each element, the more skewed the distribu- 
tion. Note also that, since л is in the denominator, the larger the number 
(п) of elements, the smaller the skewness for fixed values of p and q. 

The above formulas describe the theoretically expected distribution for 
given ns, ps, and qs. As will be seen later, any empirical distribution 
obtained by tossing 10 coins or rolling 3 dice will yield values which, for 
reasons to be discussed, will only approximate these values. 

It is of interest to consider plotting the binomial distribution as a 
histogram—the height of the successive bars will indicate the several 
expected frequencies, each of which is the numerator for a probability 
fraction. Now, if we work out the expected frequencies for number of 
heads when 20 coins are tossed, and if in drawing the histogram we scale 
the ordinate so as to have the over-all height about the same as that for the 
10-coin situation and also squeeze the base-line scale (ranging from 0 to 20) 
into about the same over-all distance as for 10 coins, the vertical bars will 
be narrower, and the resulting picture will look more like a normal 
histogram than that obtained for 10 coins. If we repeat the process with 
n larger and larger, each time scaling our axes to about the same size as 
used for 10 coins and for 20 coins, the several bars of the histograms will 
become narrower and narrower, and with п sufficiently large the bars will 
Seem to merge and the contour of the graph will tend to appear indis- 
tinguishable from a normal curve. 

The normal curve is for a continuous variable on the x axis, whereas 
the binomial distribution involves a discrete variable, or point series. For 
example, it is impossible to have any values between, say, 22 and 23 heads. 
As n is taken larger and larger, and the total base line is kept fixed, the 
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obtained values or possible points become more and more closely paei 
so that the point series approaches, or at least takes on the appearance of, 
continuity. As п approaches infinity, the binomial distribution approaches 
the normal distribution as a limit. | О 
Approximation of probabilities. The foregoing suggests the possibility 
of using the normal curve as a basis for approximating the probabilities 
obtainable by the binomial expansion. In order to see how this might be 
done we shall consider the binomial distribution for n= 16 for the coin 
tossing situation, as shown in Table 5.1. Suppose we wish to ascertain the 


Table 5.1. Binomial distribution for 16 coins 


Number of Expected Number of Expected 


Heads Frequencies Heads Frequencies 
16 1 7 11,440 
15 16 6 8,008 
14 120 5 4,368 
13 560 4 1,820 
12 1,820 3 560 
11 4,368 2 120 
10 8,008 1 16 

9 11,440 0 1 
8 12,870 —— 
65,536 


probability of getting at least 12 heads. This would be the sum of the 
Separate probabilities of tossing 12, 13, 14, 15, and 16 heads. These 
probabilities would be the respective "expected frequencies" each divided 
by 65,536; hence the sum of the probabilities would be obtained by 
summing the numerators: 1, 16, 120, 560, and 1820, then dividing this 
sum, 2517, by 65,536. Thus the probability of securing at least 12 heads 
(12 or more) would be 2517/65,535, or a decimal equivalent of .03841 
(to 5 places). 

Now let us attempt to find the same probability by using the normal 
curve approximation. First we note that for the distribution in Table 5.1 
the mean will be пр = 16(.5) and the c will be Упра = V16(.5)(5) = 2. 
It will help us understand the method of approximation if we superimpose 
on the histogram of the frequencies in Table 5.1 a normal curve having a 
mean of 8 and a a of 2 (see Fig. 5.1). If we regard the area of each bar as 
representing an expected frequency, we see that the sum of the areas for 
the bars based on 12, 13, 14, 15, and 16 heads divided by the total area of 
all the bars (— 65,536) will give the probability value of .03841 reported 
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previously. To approximate this by the normal curve we need to consider 
the area under the curve for that part of the curve which spans the bars 
with base-line values of 12, 13, 14, 15, and 16. Obviously we need the 
area under the curve beyond an X value of 11.5, a value which does not 
make much sense in terms of number of heads but which does make sense 
when it is recalled that we are here treating a point (discrete) variable as 
though it were a continuous variable, normally distributed. Hence we 
have Y — и = 11.5 — 8 = 3.5 = 2, and w/o = 3.5/2 = 1.75. Turning to 
Table A we find that the proportionate area under a normal curve beyond a 
z of 1.75 is .04006. This is our approximation to the exact probability 
value of .03841; the error in this approximation is of the order of .002. 
In general, when л is fairly large the failure to shift .5 (e.g., from 12 to 11.5 
as done here) leads to a negligible error. This shift of .5 is referred to as 
correction for continuity. 

We can, of course, use the normal curve to approximate any of the exact 
probabilities obtainable from Table 5.1 (or from the binomial with п other 
than 16). For example, the exact probability of obtaining 10 or 11 or 12 
heads is (8008 + 4368 + 1820)/65,536, or .21661. The normal curve 
approximation, calculated as the proportionate area under the curve from 
9.5 to 12.5, is .21441. 

It is fortunate for us that for лп larger and larger the normal curve 
approximation becomes better and better since for п large the computation 
of exact probabilities by the binomial method becomes very arduous. 
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Fig. 5.1. Normal curve fitted to binomial distribution. 
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Notice that, in approximating the probability, we have utilized an area 
under a curve; i.e., we have said that the area between two X values taken 
relative to a total area may be interpreted as a probability figure. This is 
not inconsistent with our original definition of probability involving 
number (frequency) of events favorable relative to a total number of 
events (total frequency). Since, as previously indicated, the total area 
under a frequency curve for a continuous variable (or function) can be 
regarded as the total frequency, and the area for a particular segment can 
be regarded as the frequency with which values (or scores) fall in the given 
segment, it follows that the ratio of the segmental to the total frequency 
may be spoken of as a probability—the probability that a score falls 
between the two X values defining the segment. When we are dealing with 
a distribution of the normal type, the probability associated with a given 
segment is found by converting the two X values, which define an interval, 
into z values and then determining the area from Table A. The obtained 
proportionate area represents the probability expressed as a decimal frac- 
tion. 

It should be obvious, when we consider the unit normal curve, that we 
can readily specify the proportionate area between 


% and 2, and interpret the proportion as the prob 
values between the given 2, and z,. 


any two z values, say 
ability of obtaining z 
By reference to tables more extensive 


The foregoing interpretation of 
curve as probabilities is, in a sense 


HYPOTHESIS TESTING 


We may now return to a consideration of the blindfold test of the claimed 
ability to distinguish between two cigarette brands. By using the binomial 


[5] PROBABILITY AND HYPOTHESIS TESTING 47 


expansion we can readily specify the probability of being correct (by 
chance) п times out of n trials. The answer is simply 1/2"; if there were 
10 trials the probability of 10 correct choices (by chance guessing—no real 
discriminatory ability) would be 1/1024, or about .001; the probability 
of being correct 16 out of 16 trials would be 1/65,536, or about .000015. 
If our self-proclaimed expert did succeed in 10 of 10 trials we would, 
because of the small probability of 10 successes by chance, concede that he 
really possessed the ability to discriminate between the two brands. 

But suppose he was successful on 9 trials of a 10-trial series? We could 
readily specify the probability of 9 successes by chance (it would be 
10/1024) but for reasons which will become apparent later, it is better to 
ascertain the probability of as many as 9 successes in 10 trials (at least 9, 
or 9 or more, successes). This probability will be the probability of exactly 
9 successes plus the probability of exactly 10 successes, or 10/1024 + 
1/1024 = 11/1024 = about .01, which is sufficiently small that we might 
decide that his performance was based on ability rather than on chance. 
Note that such a record would occur by chance about | time in 100, so 
we couldn’t be sure that he really had the ability. 

Next, let us suppose that he was correct on 8 of the 10 trials. The 
probability of at least 8 successes occurring on a chance basis would 
be 45/1024 + 10/1024 + 1/1024 = 56/1024 = about .05. Would we now 
conclude that he had the claimed ability? If we did so conclude we 
wouldn’t be as sure of our inference as when there were 9 successes, and 
far less sure than when there were 10 successes. In other words, the smaller 
the probability of attaining an obtained number of successes by chance 
the surer we would be of our conclusion. If he were successful on 7 trials 
(probability = P = .17 for 7 or more successes) we would no doubt 
hesitate before conceding that his performance was based on ability to 
discriminate, since 7 successes can too easily occur on the basis of chance 
alone. 

We are thus led to the question: What level of probability should be 
adopted as a criterion for deciding whether an observed performance is 
based on ability rather than chance? We are not yet ready to attempt an 
answer to this, but it might be remarked here that in choosing a level of 
probability it is necessary to consider the risk of being wrong in concluding 
that the fellow can discriminate vs. the risk of attributing his performance 
to chance when in reality he does have some ability. 

Whether a person can discriminate between two brands of cigarettes is a 
simple illustration of the problem of statistical inference, or the testing of 
hypotheses. For purpose of inference we set the hypothesis that our friend 
cannot discriminate between brands. This readily permits us to calculate 
the probability (P) of as many successes by chance as he attains on a series 
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of trials; if P is sufficiently small we reject the hypothesis of no ability, and 
in so doing we are saying that his number of successes is statistically 
significant, that is, nonchance. The Zevel of significance associated with 
rejection of the hypothesis is represented by a probability—if we agree to 
reject the hypothesis only when the probability of chance success is as low 
as .01, we will have adopted the P = .01 level of significance. If we are 
willing to be less sure and require P to be as low as .05 we will be working at 
the .05 level of significance. Whether we adopt the .01 or the .05 level is 
somewhat arbitrary—for this chapter let us quite arbitrarily choose P — .01 
as our working level of significance. After considering the more detailed 
discussion of this issue later in the chapter, the reader may prefer to adopt 
the .05 or some other level for judging significance. 

The binomial expansion (and normal curve approximation thereto) may 
be used in a wide variety of situations as a means of testing hypotheses. A 
general requisite is that we be able to specify the probability of success (or 
something analogous to success) for a single element (coin, die, trial, etc.). 


In other words, we need to specify p (and q) so as to use (p + q)” or we need 
to calculate the mean and c in order to utilize the normal 


curve approxima- 
tion when п is not small. 


Consider the problem of public opinion polling. In polling studies we 
are usually interested in whether or not a population of potential voters is 
split 50-50 on an issue. Accordingly we set the hypothesis that there is a 
50-50 split in the population. This hypothesis is to be accepted or rejected 
on the basis of information yielded by a sample of N persons, who are 
asked to respond “yes” (agree) or “no” (disagree) to a statement of the 


given issue. Suppose for sake of simplicity we take N — 64 and that 42 of 
them give a yes response. Is this result consistent with the hypothesis of a 
50-50 split? 


To answer this we note that so far as the opinion poller is concerned there 
is, by hypothesis, a 50-50 chance that any individual in the sample will say 
yes (this despite the fact that the individual so far as he is concerned is not 
giving a chance response). Thus the probability of a yes response for a 
single individual is 1/2; that is, pp = 5 andq = .5 (since д is always 1 — p). 
Now our sample of 64 is analogous to a trial toss of 64 Coins, so we consider 
the binomial distribution with n — N — 64. The mean — Np — 32, and 
theo = V Npq = 4. The number of yes responses, 42, deviates 10 from 
the mean. (Our normal curve approximation would be slightly better if. 


we used 41.5 — 32 — 9.5as our deviate—correction for continuity.) Thus 
we havez = 10/4 = 2.50. Turn 


50 split we need also to include the 
Viation in the opposite direction; 
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hence we double .006 and have P — .012 as the probability of as large a 
deviation irrespective of direction. Since this P is very near our arbitrarily 
(and temporarily) agreed upon P — .01 level for judging significance, we 
reject the hypothesis of an equal split in the population being sampled, and 
this rejection implies that a majority of the population would endorse the 
given statement. 

In passing, it should be noted that had the number of yes responses been, 
say, 35 we would accept the hypothesis of an equal split. But this accept- 
ance would not prove the hypothesis since 35 could easily be a chance 
deviation from any of a number of splits, 55-45, 54-46, etc. We will have 
more to say about this later. 

Opinion poll results are usually expressed in percentage form, that is, 
as proportions multiplied by 100. Thus the hypothesis of a 50-50 split 
in the population implies .50 or 50 per cent yeses, and our result of 42 yeses 
for a sample of 64 leads to .656 or 65.6 per cent yeses. Accordingly it would 
appear that in testing the significance of the deviation of 42 from 32 we are 
also testing the deviation of .656 from .50 (in proportion units) or 65.6 
from 50 (in percentage units). 

Actually, what we did above was to take 


EM 
had Rr Lug 
с 


JASS) 
In converting to proportions we divided both numerator terms by 64 (or N), 
and if we also divide the denominator by 64 (or N) we will not change the 
value of ж/о. Thus we have 


z _ 42/64 — 32/64 _ .656 — 50 _ 45 
c \/64(.5)(.5)/(64)° .062 


which differs from 2.50 only because of rounding errors. This implies that 
dividing by N somehow preserves the x/o nature of the result. The 
numerator, or z, is a deviation, the deviation of an observed sample 
proportion from a hypothetical proportion. We might, therefore, deduce 
that the denominator is a c, but o of what? 

Let f — number of yeses (frequency of yeses); / сап vary from zero to N, 
with u, = Np and о; = V Npq. If we divide every possible f by N we 
have proportions. The mean in proportion units will be u,/N = Np/N, or 
simply p, and by a principle hinted at on page 25 the standard deviation 
in proportion units will be o,/N = V Npq|N = V (pq). This last term is 
precisely what we had previously as the denominator, hence as a ø it is the 
standard deviation of a distribution of proportions; we may symbolize 
this as о. 
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In summary, we have np and V/npq as the mean and ø for a chance 
distribution of successes (on л coins or n dice or п trials, etc.). We have 


Np and V Npq as the mean and o of a chance distribution of number of yes 


responses for N individuals. We have p and Vpq/N as the mean and c of a 
chance distribution of proportion of yeses based on samples of N individ- 
uals. In the coin tossing and analogous situations, each toss or trial leads 
to a countable number of successes, and the distribution of the number of 
successes for successive trials follows the binomial. For the polling 
situation, each sample of N cases leads to a calculable proportion of yeses, 
and the distribution of proportions for successive samples (of same size) 
also tends to follow the binomial. Such a distribution is referred to as the 
random (chance) sampling distribution of proportions. 

It is customary to refer to c, as the standard error of a proportion. The 
term "error" is used here because, in effect, we are specifying the variability 
due to chance (sampling) error. Actually, the sampling distribution of 
proportions is a theoretical distribution—we usually have just one sample 
proportion (or a few at most). Statistical theory provides us with informa- 
tion concerning the central value, variability, and shape of the distribution 
to be expected if we did have a very large nu 

The scheme outlined previously for testing hypotheses is not, of course, 
restricted to the cigarette blindfold test and the polling situation. In the 
first place the p for the binomial need not be 1/2—our setup might involve 
a p of 1/3 (e.g., identifying 1 of 3), nor are we confined to the hypothesis of 
50-50 split when polling (e.g., we might be interested in whether there is a 
2 to 1 split). In the second place, we need not limit ourselves to number of 
Successes or number of yeses. The fundamental requirement is that we be 
able to categorize observations (or individuals) into two classes (a dichot- 


omy) such as pass or fail, agree or disagree, like or dislike, present or 
absent, cured or not cured, etc. 


When a hypothesis involvin 
cedure is to express the observ 


the proportion expected on th 
divide this deviation by 


mber of sample proportions. 


5 a proportion is tested, the Beneral pro- 
ed proportion, p,,, as a deviation from Prw 
e basis of a statistical hypothesis, then to 


9, = V Prga) N (5.1) 


This gives a z, sometimes called a critical ratio (CR), which for N not too 
small and p, not too extreme will follow the unit normal curve, the table of 
which permits us to ascertain the probability of a deviation as great as that 
observed. Note that the proviso that p, cannot be extreme follows from 
the fact that the binomial distribution is skewed when p is extreme, say 
when p is greater than .90 or less than .10 (see the formula for skewness on 
P. 43). Since the skewness is also a function of n, it follows that any rule 
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that we might adopt to prevent unjustifiable use of the normal curve 
approximation will be a function of N and p,. In general when both Np, 
and Nq, exceed 5 we can safely use the normal curve; if either product is 
between 5 and 10 we should deduct .5/N from the numerical value of the 
deviation of p,, from p,. This is another way of incorporating the correc- 
tion for continuity (p. 45). 

Formula (5.1) for с, has been written with p, as a value specified by the 
hypothesis to be tested. As such the formula measures the chance 
variation in proportions when the hypothesis is true. Actually, saying, 
“if there is a 50-50 split in opinion,” is the same as saying “ifthe proportion 
of yeses is .50 in the population." If we let Ppop stand for population 
proportion then the variation of sample proportions is given by substituting 
Pron (and q,,,) in (5.1). When we have an obtained proportion, рь, and do 
not know р,» (usually the case) and have no hypothesis in mind, we use 
Po aS an estimate of Ppop and 


5, = V padal N (5.2) 


as an approximation of the standard error of an observed proportion. 

At this point the student may be somewhat confused by the use of p, 
first as the probability of, say, success on a single element and then as a 
proportion. Note, however, that if we were told that .30 (a proportion) of 
a given group have brown eyes, we could say that the probability that a 
randomly selected person has brown eyes is .30. Furthermore, when we 
say that the probability of rolling a snake eye is 1/6 or .1667, we mean that 
the proportion of snake eyes for a large number of rolls will tend to be 
-1667. 

Some sampling theory. To facilitate later discussion we shall now 
introduce some notions of sampling theory. We will confine our attention 
to what is known as simple random sampling. The conditions for random 
sampling are that each individual (person, plant, animal, observation, etc.) 
ina defined population (universe, or supply) shall have an equal chance of 
being included in the sample, and that the drawing of one individual shall 
in no way affect the drawing of another (that is, the drawings must be 
independent of each other). The first condition is not easily met in practice. 
The aim is, of course, to obtain a sample which will be, within limits of 
random or chance errors, representative of the population from which it is 
drawn, 

When dealing with attributes, or the classification of individuals into 
two (or more) categories, for which the proportion in a given category is a 
useful descriptive measure, we can conceive of a population proportion, 
Prom and a proportion, рь, obtained on a random sample of N cases. Now 
if we could draw successive samples of N, determine p,, for each sample, 
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and then make a distribution of the several p,, values, we would expect this 
distribution to follow the normal curve for N not small and Prop NOL 
extreme. This follows from our discussion of the binomial distribution 
and normal curve approximation thereto, the only difference being that 
we were then speaking of a chance distribution about some hypothetical 
proportion, р,. If p, happened to equal p,,, we would be dealing with 
precisely the same distribution of sample values. If, for example, the 
hypothesis of a 50-50 split is true we would expect the distribution of 
successive sample proportions to center at .5 and have O, = V рды 
= V(.5)(.5)/N; if the population proportion, p, is .5 we would expect the 


successive sample proportions to have a mean of .5 and o,=\ "emf 
= V(CS)CSIN. 


DIFFERENCES BETWEEN PROPORTIONS 


The testing of hypotheses need not be confined to a single proportion. 
This is fortunate because in research involving attributes we are more apt 
to have two proportions, and since each is subject to chance (sampling) 
error, it follows that the difference between them will also be subject to 
chance error. To test a hypothesis regarding the difference between two 
proportions it will be necessary that we have information concerning the 
theoretical random (chance) sampling distribution of the differences 
between proportions. We will need to distinguish two different types of 
situations: (1) proportions based on two samples drawn independently 
from two populations and (2) Proportions for responses or observations 
obtained under two different conditions on just one sample. For either 
Situation we set up a statistical hypothesis known as a null hypothesis. This 
hypothesis, which states that there is no difference between the population 
proportions, will be rejected if the obtained differe 


scribed level of significance but will be 
differently, 


nce reaches some pre- 


accepted otherwise. Stated 
if the observed difference could readily arise on a chance basis 


we accept the null hypothesis; ifthe probability of its occurrence by chance 
is small we reject the null hypothesis. Note that our statistical hypothesis 
of no difference may be, and often is, diametrically opposed to the research 
hypothesis being checked by the data. That is, on the basis of theory or 
prior observations we may expect a difference, yet for statistical reasons 
we set the null hypothesis. If the obtained difference is Statistically signifi- 


cant in the expected direction we regard the data as tending to support 
the research hypothesis. 


Nonindependent proportions. 


We shall consider first the situation in 
which the two proportions bein 


g compared are not based on independent 
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groups but on just one group (or on two related groups). Suppose we are 
interested in whether a movie leads to a change of opinion, i.e., to an 
increase in the proportion favorable to some issue. We select a random 
sample from some defined population, get a yes (favorable) or no (unfavor- 
able) response from each individual, show them the movie, then again get 
a yes or no response from each. Our next step is that of tabulation and, 
since we are concerned with possible changes in opinion, we will need to 
arrange our tabulation so as to show how many changed from no to yes, 
how many from yes to no, and how many "stood pat." This can be done 
by placing tally marks in a 2 by 2, or fourfold, table such as that depicted in 
Table 5.2. For an individual who gave a yes response the first time and a 


Table 5.2. Tabulation plan for handling proportions based on 
' the same individuals 


Frequencies Proportions 
2nd 2nd 
No Yes 
Yes A B A+B 
Ist —— 
No С р C+D 


A+C B+D N 


yes response the second time, a tally would go in the upper right-hand cell; 
for a yes at first followed by a no, a tally would go in the upper left 
quadrant; and so on. Let A, B, C, and D represent the respective fre- 
quencies for yes-no, yes-yes, no-no, and no-yes responses. Then A + Bis 
the total number of yeses at first and B + D is the total number of yeses the 
second time. If each of these totals is divided by N, we will have the pro- 
portions of yeses, р, and рз, respectively, for the first (or pre-) and the 
Second (or post-) set of responses. (Note: the right-hand part of Table 5.2 
is obtained by dividing the 8 frequencies in the left-hand part by N.) 
Before proceeding to develop a scheme for testing the statistical 
Significance of the difference between the proportions, p, and pg, let us note 
that p, and p, can differ only in case the frequency A differs from the 
frequency D, since p, = (А + B)/N and p» = (B + D)|N have B in 
common. Our null hypothesis is that the movie produces no change, i.e., 
that if the movie could be shown to the entire defined population, the 
Proportion of yeses before and after would be exactly the same. This does 
not mean that an individual cannot change, but it does mean that the 
number of changes from yes to no balances off the number of changes from 
по to yes. Thus we come to the proposition that on the basis of the null 
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hypothesis we would expect those individuals who gave a changed response 
to split 50—50 as to direction ofchange. Stated differently, we would expect 
è of the A + D individuals (the changers) to change from yes to no and 
+ of them to change from no to yes. 

Since this is precisely analogous to tossing A + D coins, we would 
expect that when А + D persons change, the chance distribution of no to 
yes changes would follow, under null hypothesis conditions, the binomial 
distribution with mean of (4 + D)|2 апа o = V (A + D)(.5)(.5); that is, 
with n = A + D and p =}. Note that for A + D fixed the number of 
yes to no changes is complementary to the number of no to yes changes, 
just as when coins are tossed the number of tails is complementary to the 
number of heads—we need not count both. Thus a test of the significance 


of the deviation of either D or A from (A + D)/2 tells us whether D differs 
significantly from A. 


For A + D small, say 10 or less, 
expansion to evaluate the change, but 
Tesort to the normal curve approximation. The latter is readily accom- 
plished by expressing D as a deviation from (A + D)/2 and dividing by o, 
or by V(A + D)(.5)(.5), which gives a critical ratio, 


we may use the actual binomial 
for A + D large we will need to 


22*..D—(AtD)2 5D-5A_ Dà (5.3) 
a (А+ DASS) SYA D JATD ` 


v 
as a value with which to enter Table A to find the probability of as large 
is 2.58 (or larger) the P = .01 level 


ppreciably improved by 


deducting 1 from the absolute value of D — A; this is the correction for 


continuity again, 


sample size, N,into the picture. Dividi 


DIN — AIN 
V(A + рул? 


x 
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If we let a = A/N and d = DIN, this may be written as 

_ _4—а 54 

=- i 
уба + дум is 


x 
2 = – 
с 


[5] PROBABILITY AND HYPOTHESIS TESTING 55 


This form for z/c will make more sense if we again consider Table 5.2, 
particularly the right-hand part. Note that since a + b = p, and b + d 
= ps, it follows that d — а = ps — p, and accordingly a test of the signifi- 
cance of D as a deviation from (А + D)/2 is also a test of the'significance 
of the difference between the proportions of yeses obtained on the two 
Occasions. 

To incorporate the correction for continuity, deduct 1/N from the 
absolute value of d — a. 

The denominator of the right-hand side of (5.4) must be a standard 
deviation. Of what? Actually it is the standard deviation of the theoretical 
sampling distribution of differences between proportions, each difference 
being based on one sample of size N. Such a standard deviation, as we have 
noted previously, is referred to as a standard error. Thus we have 


op, = Ма + DIN (5.5) 
as the standard error of the difference between correlated proportions. 
The subscript r has been added to indicate that this formula holds for 
related or correlated proportions. The relationship, or correlation, concept 
needs a brief word of explanation. If, by chance sampling, p, were lower 
than the population value, we would expect p, also to be somewhat low; 
if p, were by chance high, we would expect p, to be somewhat high; if p, 
were near the population value (near average), we would expect p, to be 
near average. This varying together is referred to as a co-relationship or 
correlation. Stated differently, we would not expect the two proportions to 
vary independently of each other for successive samples. 

The proportions need not be based on the same individuals to be 
correlated. For example, if we were interested in sex differences in opinion 
we might randomly choose families and then ascertain the proportion of 
yeses among the husbands and also among the wives; for successive 
samplings the two proportions might be correlated because of a possible 
tendency for husbands and wives to agree on the given issue. As a second 
example, consider the setup involving the pairing of individuals for the 
purpose of having comparable experimental and control groups. The fact 
of pairing signifies that the two groups have not been drawn independently 
in the sampling sense; hence there might be a tendency for the proportions 
based on the two groups to be more or less alike. (About pairing we will 
have more to say in Chapter 6.) 

Another instance for which formulas (5.3), (5.4), and (5.5) are applicable 
is the problem of judging the significance of the difference between pro- 
portions of yeses for two different questions asked of the same sample of N 
cases. Since the responses to the two questions might tend to vary together 
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there could be a correlation between the proportions on successive samp- 
lings. . 

In each of the foregoing situations we have pairs of responses, and our 
tabulation must follow the scheme set forth in Table 5.2; i.e., our tabula- 
tion will lead to the frequency of yes-no, yes-yes, no-no, and no-yes 
responses. А OP 

Formulas (5.3), (5.4), and (5.5) are usable in other situations. When 
judging whether or not two test items differ significantly in difficulty we 
ordinarily have pass-fail data for both items on the same sample of N cases. 
Our tabulation leads to the frequencies for pass-fail, pass-pass, fail-fail, and 
fail-pass. The kind of response is irrelevant—it need only be such that a 
dichotomy is involved for each item or question. | 

These formulas may be safely used for any size sample provided A + D is 
10 or more (and the correction for continuity is included when А + D is 10 
to 20). If A+ D is less than 10, the binomial expansion provides an 
easily computed test of significance leading to an exact probability for as 
great a difference between the proportions as that observed. The P so 
obtained needs to be doubled to get the probability for as great a difference 
irrespective of direction; otherwise it is the probability for as large a 
difference in one direction only. About this we shall have more to say 
later under the heading, “One-tailed vs. two-tailed tests,” pp. 61-63. 

Independent proportions. It is not easy to build up a general formula 
for evaluating the difference between two Proportions based on two 
independent samples. We can, however, learn something about formula 
construction and, incidentally, illustrate a general statistical theorem by 
considering a special case involving differences between independent 
proportions. 

We have already seen how the binomial expansion, (p + q)", can be used 
as a basis for ascertaining theoretical, or expected, frequencies for various 
possible outcomes (events). Let us now see whether we can set up expected 
frequencies for the joint occurrence of events. Suppose persons J and K 
decide to while away some time at coin tossing. Each uses; = 5 coins, for 
which the binomial yields expected frequencies of 1, 5, 10, 10, 5, 1 for 
5, 4, 3, 2, 1, 0 heads, with mean = np = 
instead of making just 32 tosses, each mak 
expected frequencies would b 
320, 320, 160, 32. 

Jand K decide to make simultaneous 
about joint outcomes, that is, 
often J gets 4 heads while K 


2.5 and o? = npg = $. But 
€s 1024 tosses, for which the 
32 times the 1, 5, 10, 10, 5, 1, or 32, 160, 


tosses in order to learn something 
to see how often both get 5 heads or how 
gets 3 heads, and so on. Nowa little thought 
number of possible joint Outcomes will be 6 
cord of their results, J and K would be wise to 
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lay out a 6 by 6 table with 0 to 5 (heads) along the bottom and also along 
the left-hand side. When a particular combination occurs, say, 2 heads by 
J and 4 by K, a tally mark is entered in the cell to the right of 2 and above 4 
(enter with J's along the ordinate and К” along the abscissa). 

Can we anticipate the frequencies in the 36 cells of the table? This we 
cannot do, but we can specify the theoretically expected frequencies in 
either of two ways. The first method involves use of the multiplication 
theorem of probability. The probability of J obtaining 5 heads is 1/32; 
the probability of K obtaining 5 heads is also 1/32. The product of these 
two is 1/1024, which permits us to enter a 1 in the upper-right cell as the 


Table 5.3. Expected frequencies for joint outcomes when J and K 
each make 1024 simultaneous tosses of 5 coins 


Е. 32 160 320 320 160 32 E; 
5H 1 5 10 10 5 1 32 
4H 5 25 50 50 25 5 160 
J 3H 10 50 100 100 50 10 320 
2H 10 50 100 100 50 10 320 
ІН 5 25 50 50 25 3 160 
0H 1 5 10 10 5 1 32 
if OH 1H 2H 3H 4H 5H We 1024 

K 


expected number of times (out of 1024 simultaneous tosses) that each gets 
5 heads. The probability of the joint outcome, J 2 heads and K 4 heads, is 
10/32 times 5/32, or 50/1024, which permits us to enter 50 as the expected 
frequency in the cell defined by 2 along the left and 4 along the bottom. 
Each of the other 34 cells can be similarly filled in by the multiplication 
theorem. The second method is simpler. For the 32 times we expect J to 
get 5 heads, we would expect K's results to follow the binomial, hence we 
can immediately write down 1, 5, 10, 10, 5, 1 in the top row of the 6 by 6 
table. For the 160 times we expect J to obtain 4 heads we would again 
expect K's outcomes to follow the binomial but, since 160 is five times 32, 
we would need to multiply the 1, 5, 10, 10, 5, 1 by 5, giving 5, 25, 50, 50, 25, 
5 as entries in the second row in the 6 by 6 table. By exactly the same line 
of reasoning the other rows can easily be filled in, with results as shown in 
Table 5.3. 

When a particular cell frequency in Table 5.3 is divided by 1024 we have 
a probability for a joint occurrence. Another way of interpreting a particu- 
lar cell frequency is to regard it as a mean value in the sense that if J and K 
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performed a very, very large number of series of 1024 tosses we = 
expect the average of the obtained frequencies for that cell to correspon 
to the given theoretically expected frequency. That is, any expected 
frequency is to be regarded as the mean over an infinitely large number of 
s. H 

ш” we built up Table 5.3 for the ultimate purpose of saying something 
about the difference between independent proportions. Suppose J and K 
decide to make two additional tabulations for each pair of simultaneous 
tosses: the sum of their separate outcomes, that is, the number of heads 
for all 10 coins; and also the difference in number of heads, expressed 
arbitrarily as J’s count minus K's count. Thus for tabulating the sum of 
their results they would need “intervals” 10H, 9H, - - - 1H, 0H, whereas for 
the difference they would need +5, +4---0--- —4, —5. Again, let us 
attempt to determine the expected results. 

Itis easy to write down the expected frequencies for the various outcomes 
as sums—these would simply come from the binomial (p + q)?. We can, 
however, write them from Table 5.3. A sum of 10 (heads) can occur only 
when both J and K obtain 5 heads, for which the expectation is 1 out of 
1024. A sum of 9 can occur either when J gets 5 and K gets 4 or when J gets 
4апа K gets 5. Since the expectation for each of these is 5, the expectation 
for 9 as a sum becomes 10. A sum of 8 results from 5 and 3, 4 and 4, or 
3and5forJand К respectively, and these joint outcomes have expectations 
of 10, 25, and 10, which add to 45. Note now that diagonal adding, 
upper-left to lower-right in Table 5.3, will lead to 1, 10, 45, 120, 210, 252 
210, 120, 45, 10, 1 as expected frequencies for the possible outcomes when 
J and К sum the results for each of their Simultaneous tosses. 


As to the difference in “scores,” when J gets 5 heads and K none we 
have a difference of +5 for which the expectation is 1 (out of 1024). A 
difference of --4 can arise when J gets 5 and К gets 1 or when J gets 4 and 
K gets none; summing the two expectations, 5 + 5 = 10 as the expected 


number of times for a difference of +4. A difference of +3 can occur in 
three ways with expectations of 10, 25, and 10, which add to 45 as the 
expected frequency for a difference of --3. Note that we are again sum- 
ming diagonally in Table 5.3, this time from lower-left to upper-right. 
The results both for sums and for differences, given in Table 5.4, are 
worth scrutinization. The two distributions are identical except for their 
location parameters, the mean being 5 for one and 0 for the other. Ob- 
viously, the variances are equal. The fact that the differences have a mean 
of 0 might have been anticipated, since every time J and K toss their 5 
Coins, each is, in effect, making a trial—a trial which represents a sample. 
But each is sampling from the same universe, the universe of events when 
5 coins are tossed, (It is presumed that the coins are unbiased.) J and K’s 
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“universes” have the same mean (mp = 2.5); hence in the long run it would 
be expected that chance will operate in such a way that the average of 
obtained differences will be zero. 

Chance will also operate to produce variability in the differences, the 
standard deviation of which can be specified. We have seen that the vari- 
ance of the difference is equal to the variance of the sum. The variance of 
the sum is nothing more than the variance of the distribution of heads when 
10 coins are tossed an infinite number of times, hence the variance of the 


Table 5.4. Expected frequencies (E,) for sums (2) and differences 
(D,) for 1024 simultaneous tosses of two sets of 5 coins, and 
differences in proportions (D,) 


Differences 

Sums For heads For proportions 

X, E, D, E, D, E, 
10 1 T5 1 +1.0 1 
9 10 4 10 10 

8 45 3 45 6 45 

T 120 2 120 4 120 

6 210 1 210 2 210 

5 252 0 252 0 252 

4 210 =i 210 =.2 210 
3 120 -2 120 —4 120 

2 45 -3 45 —.6 45 

1 10 —4 10 —.8 10 

0 1 —5 1 “1.0 1 


difference is also simply npq = 10(.5)(.5). Note that 10(.5)(.5) = 5(.5)(.5) 
+ 5(.5)(.5). In general, when л, = п, + n, we can say that the variance of 
the sum will be the sum of the separate variances, i.e., л,рд = n;pq + n,pq. 
At this point, it should be obvious to the student that the variance of the 
sum of heads obtained on an infinite number of simultaneous tosses for any 
values of л, and n. not necessarily equal, will be given by summing the 
Separate variances. It is not obvious that this also holds generally for the 
variance of the differences. Later we will have an algebraic proof, showing 
that the variance of a sum (or difference) is always equal to the sum of the 
Separate variances when the events (scores) being summed are independent. 

In Table 5.4 we have a chance expected, or random, sampling distribu- 
tion of differences in number of heads, with и = 0 and o = 4 10(.5)(.5). 
Suppose that J and K changed their "scoring" system from number of 
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heads per toss to the proportion of heads per toss by simply ura ri 
former by n — 5. Thus, they would have a scale running as 0,2, 44, .6, 8, 
1.0 along the ordinate and abscissa of Table 5.3. The differences Fan 
these proportions as scores would run 4- 1.0, +8, +.6, +4, +.2, .0. —.2, 
—.4, —.6, —.8, — 1.0, as shown in the not yet discussed right-hand part of 
Table 5.4. Note that in changing the scale from number of heads per toss to 
proportion of heads per toss, both J and K divided the former буп= 5. 
Note further that the scale for differences in proportions (2,) in Table 5.4 
can be obtained by dividing the D, scale values (center of the table) by 
n = 5. This change of scale leaves и = 0 unchanged; however, the stand- 
ard deviation is changed: Gp, = $6 p, More generally, if J and К each 
toss n coins (or roll n dice) an infinite number of times, the variance of the 
random sampling distribution of the differences, in proportion units, for 
their simultaneous tosses (or rolls) will be given by 


op, = L op, = 4 (пра + пра) = 24 + 24 
п п n п 

The foregoing rather lengthy development shows опе way of arriving at 
a formula for the variance of the sampling distribution of the differences 
between independent proportions under the specified conditions, but these 
conditions (n; — n, — n, and known D) are seldom, if ever, encountered in 
research work. In practice we will have two proportions, p, and р, based 
on М, and N, cases. Both p, and P» Will be subject to sampling variation, 
hence their difference will also be influenced by sampling error. We will 
not know the two population proportions necessary for specifying exactly 
the standard errors for рі and p, and for their difference. We must, 
therefore, resort to estimation. For this purpose we will assu 
hypothesis to be true; if true, the proportions for the populations will be 
the same. The best available estimate for this unkn 
tion proportion will be obtained b 
taking р,, the proportion for the tw 
Then with g, = 1 = Py 
standard error of the di 


me the null 


own common popula- 
y pooling the two samples, i.e., by 
© samples combined, as the estimate. 
we take the following as our estimate of the 
flerence between two independent proportions: 


Sp, = [2+ Pas = pa IN, + MIN) (5.6) 
1 2 


Thevalue of p, is readily obtained by combinin 


Б the two frequencies of yeses 
(or whatever t 


he given category is) and dividing by N, = N, + Np, and 
as usual g, = 1 — р,. An observed difference divided by Sp a Will give az 
interpretable as a unit normal curve deviate provided the Ns are not too 
small and p, is not too extreme. The rule-of-thumb is that p, or q, (which- 
ever is smaller) times М, or N, (whichever is smaller) shall exceed 5. When 
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this product is between 5 and 10, a correction for continuity should be 
incorporated. This may be done by reducing the numerical (absolute) 


1/1 1 
value of the difference, p, — p», by the quantity i + 23. 
1 2 


SOME GENERAL CONSIDERATIONS 


Before going further we should stop long enough to delineate the general 
problem of hypothesis testing, discuss the question of one-tailed vs. two- 
tailed tests, and consider the problem of what level of significance to adopt. 

Which hypothesis? In general, successive samplings will yield a 
sampling distribution of frequencies or of proportions or of differences 
between statistical measures or certain ratios (such as z or other ratios, to 
be discussed later). Hypotheses, whether statistical or research, are usually 
concerned either with differences or with deviations. By research hypoth- 
esis we mean the hypothesis set up on the basis of theory or prior 
observation or on logical grounds. Such a hypothesis usually involves a 
prediction regarding the outcome ofanexperiment. By statistical hypoth- 
esis we usually mean a null hypothesis set up for the purpose of evaluating 
the research hypothesis. 

When we are considering possible differences the null hypothesis, 
frequently symbolized as Ho, is pitted against an alternate hypothesis, Hy. 
Now Hy, specifies that, for example, Руа) = P»o»t2) OF that two population 
values do not differ, whereas H, might specify that Руа) > Pronto) OF that 
Prosti) < Рома) OF that Ppop 7^ Prost Which of these alternates is 
appropriate depends on the research hypothesis to be tested by experiment 
or what question is to be answered by experiment. An experiment is 
carried out which yields sample values, р, and p» and the difference 
between p, and p, is used to test H, against H,; that is, on the basis 
of the obtained difference we are to make a decision as to whether Ho or 
H, is true. 

If Н, is true we can specify the probability of obtaining by chance a 
difference as great as p, — p» OF as great as po — Pi or as great as the numeri- 
cal (irrespective of sign) difference, p, — p». Let « represent а chosen level 
of significance—any level such as Р = .10 ог P —.05 or P —.01 or 
P = 001. We reject Ho, the null hypothesis, if the probability of the 
obtained result is as small as the chosen х, and this rejection implies the 
acceptance of H,. If « is not reached we accept Ho, but this acceptance 
merely says that H, could be true—any of a whole series of differences near 
zero could also be true. This acceptance-rejection business involves risks, 
to be discussed under “Choice of level of significance." 

One-tailed vs. two-tailed tests. The three possible alternates listed 
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previously for Н, have to do with hypotheses admissible on the basis of 
either the research hypothesis or the question for which we seek an answer 
by way of an experiment. In general, if H, states Шара) does not equal 
Prost» а two-tailed test is in order; if H, specifies which population value is 
the larger, a one-tailed test is used. The issue as to whether we should use 
a one-tailed test or a two-tailed test depends on whether the scientific 
hypothesis being tested (or at times the practical decision to be made) 
demands that we be concerned with chance deviations in just one direction 
or in both directions. For situations in which we wonder whether a 
performance is better than chance, as in blindfold cigarette discrimination, 
we are concerned only with results in one direction, since any performance 
in which the subject is successful on less than .50 of the trials leads us, 
without further statistical ado, to accept the hypothesis that he cannot 
discriminate better than chance. Thus a one- 
But for situations in which we wish to decide whether a population is split 
50-50 on some question, we need to consider chance sampling deviations in 


both directions; hence we should use a two-tailed test. 
Next consider the 


between two proportions. 

to some question for a sam 
crats as a basis for decidi 
on the given issue, we wo 
hypothesis of no differenc 
direction, has a probabilit 
level of significance, 


tailed test is appropriate. 


to the decision to change drugs, 
the idea of adopting the one whic 
tailed test since significance in e 


though obtaining similar resul 


benefit comes about in that the z for, say, the Р = .01 level of significance 
need reach only 2.33 for a one-tailed as compared with 2.58 for a two-tailed 
test. For the P — .05 level the Tespective values are 1.64 and 1.96. In 
other words a difference, to be significant, does not have to be as large fora 
one-tailed as for a two-tailed test. Since the situation involving prediction 


s 


[5] PROBABILITY AND HYPOTHESIS TESTING 63 


is equivalent to taking H, as the hypothesis that the difference between two 
population values is in a specified direction, it is not only defensible to use 
a one-tailed test but actually better in the sense that if there is a real differ- 
ence in the predicted direction it will be more apt to be detected by a one- 
tailed than by a two-tailed test. However, a few words of caution are in 
order. 

First, the prediction should be made prior to the collection of data, that 
is, independently of the data to be used in testing the hypothesis. Second, 
we must be on guard against habit—instances can be cited where an 
investigator after making a series of one-tailed tests failed to shift to a 
two-tailed test when he should have. Third, in case the results are signifi- 
cant in the direction opposite to the prediction, the investigator must, in 
effect, have a red face because the outcome is not consistent with either of 
the admissible hypotheses: no difference (as set forth by the null hypoth- 
esis) or a difference in the predicted direction (as set forth by the research 
hypothesis being tested). It is one thing to have results which simply fail 
to support a hypothesis, and quite a different thing to have an outcome 
which is diametrically opposed to the hypothesis. 

Choice of level of significance. How large should z be before the 
investigator claims significance? Asked differently, How does he choose «, 
the value of P to be required for judging significance? There is no one 
answer to this question. For a long time psychologists insisted on a z of 
3.00 (equivalent to P = .003 level for two-tailed test) as a rule-of-thumb 
value for judging significance. There might be occasions when one would 
desire the assurance represented by a P of .003, but it should be noted that 
the acceptance of the null hypothesis whenever z does not reach 3.00 may 
lead too frequently to another type of erroneous conclusion. To under- 
stand this, we must consider what it means when an observed difference 
does not lead to the rejection of the null hypothesis. Acceptance of the 
null hypothesis does not prove that no difference exists. For example, a 
difference of 1 per cent, in number of yeses for two samples, which yields 
a z of .8 does not prove that there is no difference in the two universe 
values—it merely indicates that the real difference could easily be zero. 
However, the obtained difference of 1 could be a chance departure from a 
real difference of .5 or 1.2 or 1.8 or any of a whole series of values near 1. 
In other words, the null hypothesis is one which can be rejected but can 
never be proved; therefore to accept it too often because we insist on a 
high level of significance for rejection means that we are too apt to overlook 
real differences. This, plus the fact that we do not ordinarily need the 
assurance represented by a significance level of .003, would suggest that a z 
of 3.00 is too high. 

At the other extreme, a few are willing to accept as significant a difference 
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which is 1.5 times its standard error. Since P — .13 (two-tailed) for a z 
of 1.5, it is readily seen that such persons would all too frequently have 
their publics believing that chance differences are real. A less lax level, 
which is now generally accepted by psychologists, is represented by a P of 
.05. This also may be a rather low level of significance for announcing 
something as “fact.” Those writers who advocate the .05 level for research 
workers in psychology cite R. A. Fisher, an eminent statistician, as their 
authority, but they fail to point out that Fisher's applications are to experi- 
mental situations in agriculture and biology where there is far better control 
of sampling than is ordinarily the case in psychology. 

If the findings of a study are to be used as the basis either for theory and 
further hypotheses or for social action, it does not seem unreasonable to 
require a higher level of significance than the .05 level. The answer as to 
what level, in terms of probability, should be adopted in order to call a 
finding statistically significant is not uninvolved. There is the balancing 
of risks: that of accepting the null hypothesis when to do so may mean the 
overlooking of a real difference against that of rejecting the null hypothesis 
which may lead to the acceptance of a chance difference as real. There is 
the question of the likelihood of independent verification, and, finally, 
there is the whim of personal preference: some individuals are more eager 
than others to announce a "significant" finding; others are more cautious. 
It follows that no hard and fast rule can be given; a finding may be inter- 
preted in terms of the probability of its occurrence by chance and then it 
may be noted whether the P is near the significance level adopted prior to 
theexperiment because it seemed appropriate when all factors were weighed. 

The reader will have noted from the foregoing that the testing of 
hypotheses involves the possibility of two types of erroneous conclusions. 
These are usually referred to as type I and type II errors, which we shall 
now more specifically define. Consider again the null hypothesis that no 
difference exists between two population values. If we reject this hypoth- 
esis when in fact it is true, we will have committed a type I error. If we 
accept the hypothesis when in fact it is false, we will have made a type П 
error. ' Possible outcomes of our conclusion are shown in Table 5.5. 

The factors in choosing a level of significance might be further clarified 
by a somewhat different approach. Notice that when we adopt P = «as our 
level of significance we are definitely specifying the probability of commit- 
ting a type I error; itis simply ж. By taking о smaller and smaller we can 
reduce the risk of making a type L error. But what happens to the proba- 
bility of making a type П error as we thus reduce the risk of a type I error? 
The answer, and the reasoning behind the answer, can readily be under- 
stood provided one is willing to follow carefully the following line of 
argument. Suppose we have the proportions of. immunity in two samples 
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to which two drugs have been administered, and our question is whether 
drug A is superior to drug B (a one-tailed test situation). Suppose further 
that the standard error of the difference between the two proportions is .02. 
The exposition will be somewhat simplified if we change to percentage units. 
This is readily accomplished by shifting decimals for the proportions and 
also for the standard error; the latter becomes 2 in percentage units. 
Figure 5.2 shows a series of sampling distribution curves, all with o = 2, 
but with locations differing according to supposed true, or population, 
differences of 0, 4, and 8. The top part (a) is for = .10, the middle (5) 
for о = .05, and the bottom (c) for « = .01. In each part an ordinate has 
been erected at the difference required for significance at the given o level of 


Table 5.5. Correct and incorrect statistical conclusions 


True Situation 


No difference Real difference 
Real difference Туре І error (a) Correct (8) 
Conclusion 
No difference Correct (1 — «) Type Il error (1 = й) 


significance. These required differences spring from the fact that fora 
one-tailed test the z values that cut off .10, .05, and .01 of a normal curve are 
1.28, 1.64, and 2.33 respectively, and since с is 2, the respective required 
differences in percentages would be 2.56, 3.28, and 4.66. Sample differ- 
ences falling beyond these values would be in what are termed critical 
regions for rejecting the null hypothesis at the three respective « values. 
For example, values beyond 4.66 would be in the critical region when the 
P = .01 level of significance is adopted. 

From these several sampling distribution curves and with the help of a 
table of the normal curve functions, we can specify the probability of 
committing a type II error for a specified (supposed) true difference. 
If we keep in mind that the probability of a type I error is « (= .10, .05, or 
OD, and that we can make a type I error only when the true difference is 
Zero, we see that the proportionate areas beyond 2.56, 3.28, and 4.66 for 
the three curves centering at zero represent the probabilities of making a 
type I error for the respective ® values. For all sample values in the regions 
to the left of 2.56, 3.28, and 4.66 we would correctly accept the null 
hypothesis when in reality it is true. The probabilities for correct accept- 
ance are given by 1 — œ, or .90, .95, and .99 respectively. 

Let us now consider the supposition that the true difference is 4. If 4 is 
the true difference, any obtained difference falling in the region to the right 
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L 1 1 1 
-4 -2 0 2 A 4 6 8 10 12 
2.56 
(a) œ = 10, z = 1.28. D must = 2.56. 
— 1 t 1 
-4 -2 0 2 4 4 6 8 10 12 
3.28 
(b) а =.05, z = 1.64. D must = 3.28. 
-—LL— = —L— 
4—2 0 2 4А 6 8 10 12 


4.66 
(с) а =.01, z = 2.33. D must = 4.66. 


Fig. 5.2. Type I and type II errors. 


of 2.56, 3.28, and 4.66 will, for the Tespective levels of significance, lead to 
the correct decision that a true difference exists. The probabilities for 
these correct inferences are obtained by expressing 2.56, 3.28, and 4.66 as 
deviations from 4 (the supposed true value being considered), taking each 
deviation relative to the standard error of the difference (= 2), and thus 
obtaining z values of (2.56 — 4)/2 = —.72, (3.28 — 4)/2 = —.36, and 
(4.66 — 4)/2 = .33. Looking these values up in a table of the normal 
curve we get probabilities, for correctly rejecting the null hypothesis, of 
-76, .64, and .37, for the respective specified levels of significance, when 
the true difference is 4 Percentage points. Probabilities for correctly 
rejecting the null hypothesis have been (and are usually) symbolized by f. 
Note that all sample values falling in the region to the left of 2.56, 3.28, 
and 4.66 (for the curves centering at 4) will lead to the false acceptance of 
the null hypothesis. The probabilities of making type II errors will 
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correspond to the proportionate areas, for the curves centering at 4, to the 
left of these three points (when we have the one-tail test as considered 
here). These probabilities will, of course, be given to us by 1 — В. Thus 
we have .24, .36. and .63 as the probabilities of making a type II error, 
when the true difference is 4 and for the .10, .05, and .01 levels of signifi- 
cance. Note that taking smaller and smaller increases the probability of 
making a type П error. 

For a true difference of 8, we can by a similar line of reasoning obtain 
the probability of correctly rejecting the null hypothesis and the probability 
of falsely accepting the null hypothesis, when using any one of the specified 
values of х. These probabilities will involve the areas, under the curves 
centering at 8, to the right of 2.56, 3.28, and 4.66 (for the ps) and to the left 
of these same points (for the type 11 errors). The student can readily verify 
that areas to the right of 2.56, 3.28, and 4.66 are approximately .997, .99, 
and .95 respectively. Subtracting each of these from unity will yield the 
probabilities, .003, .01, and .05, of falsely accepting the null hypothesis or 
committing a type II error when the true difference is 8 and for «s of .10, 
.05, and .01. Again, the smaller we take ж the larger the probability of 
making a type II error. 

The probabilities given in the last two paragraphs, along with similar 
figures for other supposed true differences, have been assembled in Table 
5.6. A careful study of this table reveals the general rule that the smaller the 
value of « the smaller the probability (0) of correctly rejecting the null 
hypothesis and the larger the probability (1 — В) of committing a type II 
error. Thus when we reduce the probability of making a type I error by 
choosing ж small, we do so at the risk of more often making a type П error. 
Note also that regardless of ж, the probability of making a type II error 
decreases as the true differences deviate farther and farther from zero. 
This is another way of saying that the larger the true difference the 
more apt we are to detect it by experiment, and conversely the smaller the 
difference the less likely we are to discover it. 

Incidentally, the value of В for various possible true differences is 
referred to as the power of the statistical test for detecting the difference. If 
we plotted the fs in, say, the х = .05 column of Table 5.6 against the scale 
of possible differences, we would have an ascending curve which would 
represent the power function of the test. It is beyond the scope of this book 
to consider in detail the concepts having to do with the power ofa test. It 
should be remarked, however, that statistical tests differ in their power, 
and to understand this we would need to have more information regarding 
Various tests that might be used to test a given research hypothesis. For 
instance, power depends on the choice of the critical region for rejecting 
the null hypothesis—for the first drug problem considered previously, a 
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one-tailed test is more powerful than a two-tailed test. In Chapter 6 we 
will be considering, among other things, differences between averages or 
central values, at which time it will be found that a test based on comparing 
means will be more powerful than one based on medians. | 

Perhaps the discerning student will have noted that increasing sample 
size (or sizes) tends to reduce standard errors. In the foregoing discussion 
we supposed that we had Ns and proportions such that the standard error 


Table 5.6. Probability (8) of correctly rejecting the null hypothesis and probability 
(1 — B) of type II error associated with three levels of significance (as of .10, .05, .01) 
when certain true differences are supposed to exist 


B 1-в 
= — 10 .05 .01 10 .05 01 
True 
difference 
1 22 13 .03 78 87 oT 
39 26 .09 61 74 91 
3 199 44 20 41 56 80 
4 76 64 37 24 36 63 
5 .89 79 57 11 21 43 
6 96 91 45 04 09 25 
7 .99 .97 .88 01 .03 12 
8 997 99 395 .003 01 05 
9 >.999 .997 .975 <.001 .003 .025 
10 2.999 2.999 .996 <.001 <.001 004 


of the difference (op) was 2 percentage units. Quadrupling the Ns would 
reduce the ср to 1 percentage unit. How would this affect the results 
deduced from Fig. 5.2 and set forth in Table 5.62 Take, for example, 
а = land suppose a true difference of 2 percentage points. With op = 1, 
an obtained difference would have to fall in the region beyond 2.33 x 1 
= 2.33 to be judged significant at the .01 level. With a true difference of 2, 
the proportion of sample values falling beyond 2.33, calculated by taking 
(2.33 — 2)/1 = .33 = z, is found to be .37. This is a f value to be con- 
trasted with a £ of .09 given in Table 5.6. We See, therefore, that quadru- 
pling the sample Ns has increased fourfold the probability of detecting a 
difference of 2 points. Or stated differently, the probability of a type II 
error has been reduced from .91 to .63. The moral is plain: one way of 
reducing the risk of making a type II error, without increasing the risk of a 
type I error, is to increase N or Ns. Whether this is feasible will usually 
depend on the resources available to the investigator. 
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Although contemporary mathematical statisticians usually consider 
hypothesis testing in terms of a definite reject-accept decision according 
to whether the chosen level of significance is or is not reached, there is 
another possibility. We might follow the rule of rejecting the null hypoth- 
esis when P is less than .01 (say), accepting it when P is greater than .10, 
and reserving judgment when P is between .10 and .01. This, in effect, 
introduces a region of indecision, or calls for a postponement of decision 
until the experiment is repeated or more data are collected. Another possi- 
bility, when a decision is not required for some practical reason, is simply to 
report that a difference is significant at the .09 or the .04 or the .002 or 
whatever level is reached, and then let the reader evaluate the finding 
according to his own preferred level of significance (which he is apt to do 
anyway unless he is too naive). 

There are a couple of other points regarding significance. First, a 
statistically significant difference doesn’t necessarily mean a difference 
either of practical significance or of scientific import. Sometimes a “what 
of it” is not an impertinence. Second, the habit of merely checking to see 
whether a result reaches a chosen level of significance should not lead us to 
overlook the possibility of claiming, when appropriate, that a much higher 
level of significance was attained than the preresearch chosen level. 


SUMMARY 


In this chapter we have given a brief account of the concept of probability 
and have sketched procedures for applying probability notions in the 
testing of hypotheses involving frequencies and proportions (or per- 
centages). We have noted the conditions for which it is safe to use a 2 
and the normal curve to approximate probabilities. If these conditions 
do not hold (when samples are small or proportions are extreme), we can 
obtain P exactly by way of the actual binomial expansion for situations 
involving one proportion and for two correlated proportions. For 
Proportions based on independent samples, exact Ps may be ascertained 
by another, and more complicated, method to be presented later (p. 236). 

The discussion of this chapter is only an introduction to the theory of 
Statistical inference, or the use of probability in the testing of hypotheses. 
We have, however, developed the general principles. The extension of the 
theory to hypotheses involving continuous variables for relatively simple 
Situations will be given in Chapters 6 and 7, with methods for more 
complex situations being postponed to later chapters (14-19). In Chapter 
13 we shall discuss more extensive procedures for handling hypotheses 


regarding frequencies and proportions. 


Chapter 6 


INFERENCE: CONTINUOUS 
VARIABLES 


As will be recalled, a frequency distribution for measurements оп a 
continuous variable is describable with re 


skewness, and kurtosis; 
will be concerned with 
four features of a frequ 
need information rega 
measure being used (or 

In Chapter 5 we wer 
at the intuitive level, t 
of differences between 


normal curve. Unfortunately, 
tions of the measures that d 
readily be determined. Accord 
made by the mathematical stati 
mathematically the characteristi 


Tesults as a basis for testing h 
his mathematical derivations 
Since hypotheses involvin 


§ means arise frequently in practice and since 
inferences based on means serve to il 
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theory and its use as a basis for hypothesis testing. This chapter will be 
restricted to the large sample situation, with requisite sample size specified 
at appropriate times. 


EMPIRICAL DEMONSTRATION 


The operation of chance sampling errors for means and standard 
deviations can be illustrated by tossing, say, 7 coins 50 times and tabulating 
the number of heads per toss. The obtained frequencies will usually vary 
somewhat from those expected, which would be proportional to 1, 7, 21, 
35, 35, 21, 7, 1 (as obtained by the binomial expansion). When the mean 
number of heads for 50 tosses is computed, it is not likely to be exactly 3.5 
(np, the mean of the expected distribution), and the discrepancy from 3.5 
can be attributed to chance. Likewise, 100 tosses will show departures 
from the expected frequencies, and consequently the mean based on 100 
tosses will differ more or less from 3.5. Furthermore, and for the same 
reason, the standard deviation of the obtained distribution of heads will 
likely differ from 1.323 (\ пра, the с of the expected frequencies). As an 
exercise the student can demonstrate the foregoing statements by actually 
tossing coins, Indeed it will be quite instructive if each class member 
tosses 7 coins 50 times, each time tallying the number of heads that turn up. 
This will lead to a frequency distribution running (possibly) from 0 to 7 
heads, with an M of 50. Then a second series of 50 tosses should be made, 
thus providing a second distribution. The two frequency distributions can 
be combined, so each student will have three distributions, two with Ns of 
50 and one with an N of 100. Note that chance is so operating as to pro- 
duce a distribution somewhat similar to the expected, but at the same time 
is operating in such a manner as to lead to discrepancies between observed 
and expected frequencies. 

Each student should compute the means and the standard deviations 
for each of the three distributions. Note how far these values depart from 
the expected mean of 3.5 and the expected standard deviation of 1.323, 
Then the several means and standard deviations secured by the class 
members should be brought together. In order better to understand what 
happens when each of several persons tosses 7 coins 50 times, i.e., takes a 
sample of 50 tosses, a frequency distribution of the Ms, also of the Ss, 
based on 50 tosses should be made. Likewise a separate distribution should 
be made for the Ms based on 100 tosses; also, the Ss. A study of these 
distributions should provide answers to such questions as: Their central 
tendencies are near what values? What is the extent of dispersion for these 
distributions of Ms and Ss? Is there any difference in the dispersion for 
the distribution of means based on 50 tosses and that based on 100 tosses? 
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How would you account for this difference? In general, what is the Shape 
of these distributions of Ms and Ss? 

Table 6.1 shows the distributions of the means obtained by several of 
the author's classes. Though these are not models for number of intervals, 
they are nevertheless sufficient as a basis for answering the foregoing 
questions. Note that both distributions appear to be normal, that both 
center very near the mean of the theoretical distribution (3.5), and that the 


Table 6.1. Distribution of 600 means based on 50 tosses 
and 300 means based on 100 tosses of 7 coins 


50 Tosses 100 Tosses 


4.00-4.09 3 
3.90-3.99 14 
3.80-3.89 35 4 
3.70-3.79 50 23 
3.60-3.69 98 58 
3.50-3.59 119 78 
3.40-3.49 120 85 
3.30-3.39 85 32 
3.20-3.29 52 17 
3.10-3.19 21 3 
3.00-3.09 2 
2.90-2.99 1 
Number of means 600 300 
Mean of means 3.516 3.513 
S* of distribution 

of means .190 4:135 
Expected S .187 1132 


* Corrected for grouping. 
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samplesize. Noteat the bottom of Table 6.1 that the Ss ofthe distributions 
of means, .190 and .135, are very near the expected values of .187 and .132 


obtained from 1.323/V 50 and 1.323/V/100, respectively. 

Summarizing the results of the foregoing empirical work, we see that 
the means for successive samples tend to distribute themselves normally 
about the expected or universe mean, и, with a spread or standard 
deviation which is very near the value predicted by mathematical theory. 
The student should keep these empirical distributions and deductions 
therefrom in mind as we now proceed to a more detailed consideration 
of what the mathematical statistician says will happen when successive 
samples of a given size are drawn from a defined universe or population or 


Supply. 


MORE SAMPLING THEORY 


The discussion here holds for what is known as simple random sampling. 
As specified in Chapter 5, the conditions for simple random sampling are 
that the sample should be drawn in such a way that each individual 
(person, plant, animal, etc.) in the defined universe shall have an equal 
chance of being included in the sample, and that the drawing of one 
individual shall in no way affect the drawing of another. The aim is, of 
course, to obtain a sample which will, within limits of random or chance 
errors, be representative of the universe from which it was drawn. 

Let 


N = the number of cases, or size of sample. 
M = the mean of any sample (known, i.e., computed). 

S = the standard deviation of any sample (known, i.e., computed). 
u = the mean of the defined population (unknown). 

o = the standard deviation of the defined population (unknown). 


The и and с are for the distribution of scores or measurements for all 
the individuals in the defined universe. It is not assumed that this universe 
distribution is exactly normal; it may be skewedslightly. Strictlyspeaking, 
the number, Мо Of cases in the universe should be infinitely large, but 
failure to meet this requirement is not serious. As will be seen later, the 
Adjustment necessary when a sample of N cases is drawn from a limited 
(finite) universe of Nop cases is of the order of ММ»; if it is known 
that М, is very large relative to N, the formulations about to be pre- 
Sented will be sufficiently accurate for all practical purposes. 

Now suppose we draw a sample of N cases, compute the mean and 
Standard deviation, then draw another sample of the same size and 
Compute its mean and standard deviation, and so on until a large number 
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of samples, say 10,000, have been drawn. We will then have 10.000 
means and 10,000 standard deviations, each based on N cases. When we 
make a distribution of the 10.000 means and of the 10,000 standard 
deviations, we have random sampling distributions. From the point of 
view of mathematical rigor. the number of successive samples should be 
much larger than 10,000, certainly far larger than the 600, or 300, succes- 
sive samples of Table 6.1, in which we have only the beginning of two 
random sampling distributions. 

By rather complex mathematical methods it can be shown that, if 
successive samples of constant size, N, are drawn randomly from a normally 
distributed universe or population with mean equal to и and standard 
deviation equal to c, the successive sample means will be normally dis- 
tributed about и, and the standard deviation of this sampling distribution 
will be o/V N. The random sampling distribution of the successive stand- 
ard deviations will center at o (there is a small bias here which need not 
concern us at this time). For N large (100 or more) this distribution of Ss 
will be approximately normal with standard deviation equal to c[V/2N. 
These mathematical findings have often been checked empirically. Table 
6.1 provides a limited check on the sampling theory regarding the mean. 

We are now in position to consider a term used in Chapter 5. In general, 
the standard error of a statistical measure is the standard deviation of the 
sampling distribution for the given measure. The square of the standard 
error is called the sampling variance. For the practical statistician, the 
sampling distribution is hypothetical, and hence its standard deviation 
must be determined by a different formula from that used for computation 
from an actual distribution. The value given by c|V/N is called the 
standard error of the mean and may be designated as сү. Each sample 
mean can be expressed in relative deviate form as (M — и)/с у, and these 
relative deviates will form a normal distribution with mean of zero and 
Standard deviation of unity. By reference to Table A we can readily 
specify the chances of obtaining a sample mean yielding a deviation as 
great as that for a given M, provided the value of is known, But in 
practical work is the unknown about which we desire to make an infer- 
ence on the basis of just one sample. 

Before resolving this practical problem, we must call attention to the fact 
that the universe standard deviation, ø, needed to obtain oy, is also an 
unknown. A single sample will yield a standard deviation, S, which, 
being a sample value, will of course deviate more or less from о. In 
order that an inference about р may be made from a single sample, 
Gy, İs estimated by using S/V №: i.e., the unknown ø is replaced by the 
sample S as an estimate. Instead of the true value for the standard error 
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of the mean as given by olVN, we have an approximate value, S/ VN. 
Let S,,, defined as SIVN, stand for the approximate standard error. 

The ignorance concerning c, and the consequent approximate value 
for the standard error of a given mean, lead to a reconsideration of the 
sampling distribution of means expressed as relative deviates. As already 
pointed out, the means from successive samples will be distributed 
normally, and the relative deviates, (M — 2)/oy,. will likewise be distri- 
buted normally since оу, = № № isa constant. When (as is nearly always 
the case) we have S instead of о and wish to make an inference about a 
universe mean, we need to know something of the sampling behavior of 
successive sample means expressed as relative deviates from x where S, is 
not a constant but varies from sample to sample because the several sample 
standard deviations vary. Thus the relative deviate of the first sample 
mean will be (M, — д) divided by SINN; for the second sample, 
(M, — и) divided by S/Y ^N; and so on. The distribution of these 
relative deviates will not approximate normality unless N is fairly large. 
Thus the use of an estimate of c in determining 0 imposes the restriction 
that N shall not be too small. If № is not less than 30, we can safely use the 
normal curve as the basis for drawing an inference or testing a hypothesis 
regarding и. This chapter's discussion of sampling is therefore not 
applicable unless N is greater than 30. The refinements necessary for Ns 
less than 30 will be given in Chapter 7. 


HYPOTHESES REGARDING A SINGLE MEASURE 


Whether the foregoing theory is used as a basis for making an inference 
about a population value or for testing some hypothesis depends on the 
practical problem faced by the investigator. We shall nowconsider hypoth- 
esis testing, and later we shall discuss a type of inference which is useful 
both when we do and do not have a research hypothesis in mind. 

Single mean. The procedure for testing a hypothesis about a population 
mean on the basis of a sample mean (and S) for N cases is very similar to 
that for testing a hypothesis when we have a sample proportion (discussed 
earlier, pp. 48-50). We let M, stand for a hypothesized value of д. 
Our sample mean, M, taken as a deviation from М, is expressed in the 
form of az, that is, as (M — M,)/Sur- The theory tells us that if M, is true 
(ie, corresponds to и), successive sample Ms will be distributed normally 
about M, with standard deviation = Sy, (approximately). In testing the 
given hypothesis we are merely raising the question as to whether it is 
Teasonable to believe that our observed sample mean, M, belongs to a 
Sampling distribution centering at M, Put differently, does M deviate 
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significantly from M;? To answer this we need to know the probability 
of as large a deviation on the basis of chance sampling errors, and to get 
this probability we need only enter Table A with (M — M,)ISor asaz. If 
we have decided to adopt the P = .01 level for judging significance, we 
reject the hypothesis when (M — M,)/S,, reaches 2.58 (for a two-tailed 
test) or 2.33 (for a one-tailed test); otherwise we accept the hypothesis. 

Actually, there are relatively few occasions in psychological research for 
which either scientific theory or prior observation provides us with a 
hypothesis concerning the mean for a population on some variable. An 
exception is the mean of changes, to be discussed shortly. 

As an example of a situation for which the testing of a hypothesis about 
a mean is appropriate we cite the IQ tests. For reasons which we shall not 
discuss here, a properly constructed test should yield 100 as the average 
IQ for the population of children for any given age level. Consider Form 
L of the 1937 Revision of the Stanford-Binet Scale. For age 7, a sample of 
202 gives a mean of 101.78 and an S of 16.18. The value of Sar becomes 
16.18/V/202 = 1.14. From these figures we have (M — M,)/S, = (101.78 
— 100)/1.14 — 1.56 as a z. Turning to Table A we find that the P for as 
large a deviation (irrespective of direction—a two-tailed test is needed 
here) from 100is .12. Since this probability is not as small as our arbitrarily 
chosen P = .01 level of significance, we accept the hypothesis that the 
1937 Stanford-Binet meets the requirement of yielding an average of 100 
at age 7. That the scale was not entirely satisfactory in this regard is 
evident when we consider the M of 104.28 and S of 16.42 for a sample of 
204 nine year olds. We have Sy, = 1.15, which leads to a z of (104.28 
— 100)/1.15 — 3.72. Since the probability of as large a deviation is about 
:0002, we reject the hypothesis that the scale would 
mean of 100 at age 9. 

Significance of mean change. 
that of evaluating changes in orde 


ence or change in conditions leads 
Let 


X, = score prior to experience (or under one condition). 
Xa = score after the experience (or under second condition). 
D = X, — X, = change score. 


yield a population 


A frequently encountered problem is 
r to say whether some provided experi- 
to a shift in performance. 


Or we might take D = X, — X, if losses instead of gains are of interest, 
but regardless of which way we define the D Score, the subtraction is 
made in the same direction for all N cases and negative signs are kept. 
A sample of N individuals will give us N changes, or N Ds. We can either 
make or conceive of a distribution of the Ds. This distribution will have 
a mean, М}, and a standard deviation, Sp, whence we can get the standard 
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error of the mean difference: Sy, = SplVN. In other words, a mean 
change is treated just like any other mean. Regardless of any hunch or 
prediction about the effect of the experience (or the effect of the change in 
conditions), the null hypothesis is set that there is no effect. This is equi- 
valent to saying that, if we had X, and X, scores on the defined population, 
the value of up would be zero. If this hypothesis is true and if we were to 
take successive samples of size N, we would expect that the sample means 
would be distributed normally about zero with 5 = S),,. To test the null 
hypothesis we simply take our obtained Mp as a deviation from the null 
value of zero and divide by Syr, That is, (Mp — 0/53, = Mp/Sy;,. 
This as a z is then used as an entry into Table A in order to specify the 
probability of as large a mean difference as our sample My arising solely 
on the basis of chance sampling. Whether we reject or accept the hypoth- 
esis of no effect depends on whether P does or does not reach the chosen 
level of significance. We could use a one-tailed test here if the research 
hypothesis predicted the direction of the change, but if we had no a priori 
hypothesis as to the direction of change we would need to use the two-tailed 
test. 

A word should be inserted about the required computations since there 
is some danger of confusion when we are confronted with the calculation 
of M and S for scores (changes) which are both positive and negative, and 
sometimes zero. The gross score formula for the mean (3.2) and that for 
the standard deviation (3.7) are applicable provided we take =D (equi- 
valent to XX) as the algebraic sum. The equivalent of XX, that is, Zi D?, 
raises no problem since the squaring process automatically eliminates 
negative signs. There are two reasons why we should make a frequency 
distribution of the Ds. First, the theory assumes that the Ds approximate a 
normal distribution; if a distribution is made we have at least a rough check 
on this assumption (there are statistical methods for checking this assump- 
tion; see p. 79 and also p. 231). Second, if N is sizable, computation from 
a frequency distribution is more economical of time than use of the gross 
score formulas. In laying out the intervals, we must provide a place for 
tabulating zero Ds. This can conveniently be accomplished by thefollowing 
illustrative scheme which includes only the four intervals near zero: 
2-3, 0-1, —1-2, —3-4 (for i — 2); 3-5, 0-2, —1-3, —4-6 (for i = 3); 
4-7, 0-3, — 1-4, —5-8 (for i = 4); etc. Note that the last given intervals 
in each set are for negative Ds. AO taken as the midpoint of the bottom 
interval will be a negative number, and must be treated as such when 
entered into formula (3.3). 

Other single measures. The general theory of statistical inference 
may be extended to testing hypotheses concerning any descriptive measure, 
provided information is available (from the mathematical statistician) 
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concerning the characteristics of the random sampling distribution of the 
measure. When the sampling distribution is normal in form with known 
or estimable variability, we may proceed to test hypotheses by setting up a 
z, or z[S, or x/o. For this purpose we need formulas for the standard 
errors of different measures. The formulas about to be presented are 
based on the assumption that the score distribution is normal or approxi- 
mately so. 
As previously noted, for N greater than 30 we may safely use 


S 
Su = EXP (6.1) 
JN 
as the standard error of the mean. For N greater than 100 it is safe to take 
2536 
Sian = 1:2535 (6.2) 
JN 


as the standard error of the median. A comparison of the standard error 
of the mean with that of the median indicates that the mean fluctuates less 
than the median; i.e., the mean is a more stable measure of central value 
than the median. In order to reduce the standard error of the median to 
the same magnitude as that of the mean it is necessary to take 57 per cent 
more cases, i.e., increase № by 57 per cent. It follows from this that the use 
of the median for distributions which are reasonably normal in form is 
equivalent to throwing away a large proportion of the cases. 

The sampling errors involved in measures of dispersion are 


S .707S 
Ss = —— = =. 6.3 
5 Уп JN 7075 у; (6.3) 
-756(AD 
Sup = SEE) 
VN 
1.166( 
So = 1.1660) 
VN 


From these error formulas it will be seen that, considering the error relative 
to the magnitude of the measures of dispersion, S is the most stable measure 
of variation. Provided N is 100 or more, the sampling distributions for 
these measures of dispersion are such that their standard errors can be 
utilized in exactly the same way as the standard error of the mean. 


The standard errors for measures of skewness and kurtosis, as defined 
on p. 26, are R 


6 
PLAT (6.4) 
— LS (6.5) 
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These two formulas are based on the assumption that the sample has been 
drawn from a normally distributed population, and therefore they can be 
legitimately used in testing the assumption of normality. It will be recalled 
that, for normal distributions, both g, and 5» are equal to zero, but for a 
sample they may not be zero; however. sample values should not show 
a greater deviation from zero than can be reasonably attributed to chance. 
If a sample yields a g, value which is more than, say, 2.58 times its sampling 
error, we would suspect that the sample was not drawn from a symmetri- 
cally distributed supply. Likewise, if gə deviates more than 2.58 times its 
standard error, we would question whether it is reasonable to believe that 
the population or supply is distributed with normal kurtosis. A two-tailed 
test is appropriate here, and consequently choosing 2.58 is equivalent to 
adopting the .01 level of significance. 


HYPOTHESES ABOUT DIFFERENCES 


One of the foremost problems in practical statistics is the comparison 
of group trends. We may wonder whether one college group is superior to 
another, whether practice on a task improves performance, whether rats 
learn more rapidly when food or when water is the incentive, whether 
reaction time is faster to sound than to light, whether the sexes show a 
difference in variational tendency, whether one learning method is better 
than another, etc. In order to answer questions like the above, it is neces- 
sary to make observations on samples from two groups or on the same 
group under two different experimental conditions, and then to compute 
appropriate statistical measures for the variable on which we wish to make 
the comparison. 

Thus, typically, we have two samples of N; and № cases or two sets of 
Scores on just № cases under two different conditions, with means M, and 
M, and standard deviations S; and S», where the subscripts refer to the two 
Sets of scores. As we have learned, each mean is subject to sampling 
fluctuations; therefore the dillerence between the means will also be 
subject to sampling fluctuations. Even though д = #2 there may be a 
difference between sample means because of chance sampling errors. To 
test an obtained difference for significance we will need a measure of the 
sampling error of differences. ie. the standard error of the difference 


between two means. Knowing this standard error we can set up the 


null hypothesis that there is no difference between the two population 
means and then reject or accept this hypothesis according to whether the 
obtained difference does or does not reach an appropriate level of signifi- 
cance. 

Here, as in the case of the difference between proportions, we must 
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distinguish between the situation where our two means are based on 
independent as opposed to nonindependent (correlated) scores. 

Difference between correlated means. Let us again consider the method 
outlined previously for testing the significance of a mean change. As 
implied there, the X, and X, scores could stand for performance for № 
individuals under two different conditions. A little simple algebra at this 
point will lead to some interesting results. As before, we let 


D =% = % 


By definition the mean of the distribution of these N difference scores 
will be 


M 
oN N 
Sx, XX, 
ON N 


hence 
Mp = M, — M, = Dy 


by which we see that the mean of the difference is equal to the difference 
between the means. This will, of course, be true for every sample. It 
follows therefore that when we test the significance of Mp as a deviation 
from zero we are also testing the significance of D;, as a deviation from 


zero. In other words, we are testing the significance of the difference 
between two means based on the same N cases. 


When testing M p, we calculated Sp, thence S,, . Let us consider a bit 
further the standard deviation of the distribution of differences, Sp. We 
first express the Ds as deviations from their own mean, i.e., d = D — Mp. 
Since D = X, — X, and Mp = M» — M,, we have 


d — (X, — 1) — (M, — М) 
Which, when the parentheses are removed and the terms shifted, becomes 


d—X,— M,— X, + M, 
or 


d = (X, — Mj) — (X, — М) 


Both these new parentheses terms define deviation units of the type 


== X — M, so that d = x, — x. The standard deviation squared, or 
variance, of the difference can be expressed by substituting d for x in 
formula (3.4); thus 


5 ILLAE. 
BN 
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If we replace d by its equivalent, we have 
s = X(x, — x _ Ex " Уа? _ xx 

s: N N N N 
The first two of the three terms on the right are obviously the variances for 
the second and first sets of scores. The last term, involving the sum of the 
cross products of a, and the x, with which it is paired, has to do with the 


degree of correlation between, or similarity of, the scores that belong to the 
same individual. The reader is asked to take on faith, without further 


explanation here, the fact that the last term becomes 27,55,S,, in which r isa 
measure of correlation. Hence we can write 


5% = 5% + 5°, — 272555 (6.6) 


es NE CIC 
Sp = М5% + 5% — 2725153 


Since the standard error of any mean is given by dividing the standard 
deviation by the square root of N, we secure the standard error of the 


mean difference by dividing Sp by VN, i.e., 


or 


S =- 
Ma М JN 


rms under the last radical are the sampling variances of the 


The first two te 
S,/N can be written as 


two means, and since 271251 


we have finally that 
S, f 
Sy, = 3 = У Sar, + San — 2ryS м5 м» 
WI" 
Since each Mp = Ру 
error of the mean difference is 
between the two means. Thus W 
between nonindependent means. 


it follows that S37, = 5р,» OF that the standard 
equal to the standard error of the difference 
e have two ways for evaluating a difference 
We can compute Mp, Sp; thence 

Sp 


Sar = 52 (6.7 
M JN ) 


or we can compute М, Ma Si, Sa, and г, and then obtain 


ИЕГЕ 
Sar, = VS уг, + Sarn — 2rypSy1,S5a1, = Sp, (6.8) 
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Formula (6.8) is usually referred to as the standard error of the difference 
between correlated means, hence the symbol Sp,- 

But by working with the difference between paired scores, we can obtain 
the standard error of the mean difference (— difference between means) 
without computing r. Even after we have learned how to compute r, it 
matters not whether we compute the standard error of the difference 
between means of related scores by formula (6.8), or whether we compute 
its equivalent, the standard error of the mean of the differences. 

Strictly speaking, the ry. in (6.8) should be written as ry, зу, so as to 
indicate that it is a measure of the extent to which successive pairs of 
means vary together, but it can be shown that the correlation between 
means is the same as гу», the correlation between the scores entering into 
the means. 


Since Mp = Dj, and Sy, = Spy it should be obvious that when 
testing the null hypothesis we have 


That is, the procedure for testing the null hypothesis that M p is zero for a 
population is equivalent to testing the null hypothesis that jj = д» where 
the subscripts 1 and 2 indicate that we are considering two populations of 
scores, one for each condition. 

Formulas (6.7) and (6.8) are appropriate in a number of situations in 


which an X, score is somehow paired with an X, score. Some of the possi- 
bilities are the following: 


X, as first trial—practice— X, as later trial; same person. 

X, as initial—experience—X, as final; same person. 

X, as pretest—experience— X, as posttest; same person. 

X, under experimental conditions vs. X, under normal (or control); 
same person. 

X, in one experimental condition vs. X, in another; same person. 

X, as experimental vs. X, as control; twin or litter pair. 

X; as experimental vs. X, as control; unrelated persons, but matched 


by pairing on pertinent variables. Ditto, for two experimental 
conditions. 


ROO TR 


ве 8 


For situation (g), which is commonly employed in experimental work, 
we can think of having drawn N individuals at random for one group, then 
forming the second group by selecting individuals who can be paired with 
the members of the first group on the basis of variables which need to be 
controlled. Thus any found difference between M, and М» will not be 
attributable to differences between the two groups with respect to the 
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variables used in forming the pairs, since the pairing tends to make the 
groups equivalent on the pairing variables. This same pairing procedure, 
and also twin or litter pairs, can be used for situation e. Furthermore, as 
we shall see below, the X, and X; scores can themselves stand for changes: 
X, the change from pretest to posttest under an experimental condition 
and Y, the change under another experimental condition or under control 
conditions. 

The statistical advantages of having scores which are somehow related 
will be discussed later under the caption “Reduction of sampling errors." 

Difference between independent means. When we have means for 
ve been drawn independently, there will be no way of 
ance basis and chance pairing will tend to 
produce a zero correlation. In fact, if we took all possible pairs the 
correlation would be exactly zero. Thus the correlation term in (6.8) 
vanishes, so that the standard error of the difference between means based 
on independent samples becomes 


Ix кй. 5° 
Spy = VS эһ + Sus EE TA 


two samples which ha 
pairing scores except on a ch 


(6.9) 


This formula is not restricted to samples of the same size; i.e., № need not 
equal №. The right-hand form of (6.9) has an obvious computational 


advantage. 
nula (6.9) may be used in exactly the same 


The Sp, obtainable by forn 
manner as the standard error of the difference by formulas (6.7) and (6.8). 


Again, we set the null hypothesis that ду = д» or that the difference 
between the population means is zero. If it is zero, the sampling distri- 
bution of D, resulting from successive replications will center at zero with 
standard deviation = Sp,. If DylSp,, (or 2) is sufficiently large, the null 
hypothesis is rejected; if not, it is accepted. In other words the general 
procedure for testing hypotheses about differences is precisely the same 
for means (and other statistical me 


asures) as that outlined in Chapter 5. 
The student would do well to review the discussion dealing with admissible 
hypotheses, one-tailed vs. 


two-tailed tests, choice of level of significance, 
and the two types of erro 


r one risks in testing hypotheses. 
Differences between other descriptive measures. The general theory 
of hypothesis testing is applicable for descriptive measures other than pro- 
portions or means. The genera 


| pattern for the standard error of the differ- 
ence between any two statistical measures, 


say C, and С, is 
Sp, E S?o, i Se, 7 2re,c,Se,Scy 
standard error for both C, and C, and a 


That is, we need to know the { 
measure of the correlation between C, and C; in case of nonindependence 
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(the r term drops out for independently drawn samples). This correlation, 
which is a measure of the extent to which C, and C, vary together when 
successive samples are drawn, is known to be ry, when the Cs are means 
and to be r2, when the Cs are standard deviations, with ry, being the 
correlation between the scores entering into the means and standard 
deviations. Accordingly, the standard error of the difference between two 
Ss based on the same individuals or on scores related consanguineously or 
related by pairing on pertinent variables is given by 


Sp, = V 55 + S*,, — 27388 55, (6.10) 
and for Ss based on independent samples 
Sp, = V S*s, + S*s, = .707Sp,, (6.11) 


These formulas are valid for large Ns (100 or more), and to test the null 
hypothesis we simply take Ds/Sp, as a unit normal z. (For Ns small, see 
Chapter 14.) 

The difference between medians based on correlated scores cannot be 
tested because the needed r is unknown, but for independent samples we 


have 
= 4/52 2 
n S mdn, T S тап, 


Expressions for Sj, | and for Sp, can be similarly written for the case of 
independent samples. 

Any student who is worried because formula (5.5) for the standard error 
of the difference between correlated proportions does not include an r term 
may rest assured that the correlation has been allowed for even though not 
visibly so. Formula (5.5) is analogous to formula (6.7), which we have seen 
is equivalent to the longer formula (6.8) in which there is an r. 


Sp 


REDUCTION OF SAMPLING ERRORS 


One of the aims of scientific method is to attain as great precision in 
results as is practicable. In statistical work this can be accomplished by 
increasing the accuracy or dependability of the scores or individual 
measurements or responses and by decreasing the chance sampling errors 
of the various descriptive measures. One way to reduce sampling errors is 
to employ either the stratified or the area method of sampling, both 
of which are too complicated for us to discuss here. If the random 
sampling method is being used in projects which aim to study the difference 
between groups (or populations), the obvious, and only, way for decreasing 
the standard error of the difference is to increase N for either or for both 
samples. Most field investigations are of this type. 
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In contrast, the experimentalist can define his population with reference 
to two laboratory or experimental situations, i.e., a population of indivi- 
duals under situation 4 and a population of individuals under situation B. 
His sample individuals for the two situations may be the same indivi- 
duals, first under the 4 and then under the B condition. In general, the use 
of the same individuals, if feasible in view of possible practice or fatigue 
effects, will usually involve a fairly high degree of correlation, the net effect 
of which is to reduce the standard error of the difference considerably; i.e., 
it is sometimes possible to reduce sampling error simply by using the same 
individuals as the “two” samples. Thus, if we wish to study the effect of 
two different degrees of humidity on mental output or efficiency, it will be a 
more economical and better controlled experiment if we make observations 
on the same individuals under the two conditions 4 and B, rather than on 
N, individuals under condition A and №, individuals under condition B. 

1f it is not feasible to use the same individuals in the two experimental 
situations, we can make up two groups by pairing or matching individuals 
on the basis of one or more characteristics. Such a procedure leads to 
more nearly comparable groups for our experiment than can be obtained 
by choosing individuals at random and, by usingeitherformula (6.7) or (6.8) 
instead of (6.9), we can make allowance for the fact that the individuals for 
the two samples have not beenchosenindependently. Theuseofindividuals 
who have been paired is considered goodexperimental technique—it cannot 
be said that a found difference between means for the variable being studied 
may be due to a lack of comparability of the two groups with respect to the 
matching variables. The use of paired individuals has a statistical as well as 
experimental advantage in that the sampling error of the difference between 
means is thereby reduced without the necessity of increasing the number of 
cases. If pairing produces an r of .75, the reduction in 5), is equivalent to 
that achieved by quadrupling the number of cases when the random 
method of forming groups is employed. After the student has learned 
about correlation he will better appreciate the fact that the gain in pairing 
depends on the extent to which the variables used in pairing are correlated 
with the variable being studied. : 

It is thus seen that, for some types of investigations, greater precision 
can be obtained by judicious planning. If we had unlimited resources, we 
could always attain any desired degree of precision by simply taking 
sufficiently large samples. 

Frequently the question is raised as to how many cases should be secured 
fora given study. The answer might be in terms of the number needed to 
reach a given degree of accuracy, but this in turn would raise the question 
of what degree of precision is needed, and this in turn depends on how 
Small a difference we wish to detect. When group comparisons are made 
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and when the Ns are relatively small. the null hypothesis is apt to be 
accepted too often for the simple reason that a real difference has to be 
sizable before it is demonstrable by small samples. On the other hand, if a 
real difference is so small that its statistical demonstration requires 
thousands of cases, we may question whether it has practical or scientific 
importance. 


COMPARISON OF CHANGES 


Although the comparison of changes involves nothing new in the way 
of statistical theory, such comparisons are somewhat more complicated 
than the tests of significance so far discussed. The researcher may be 
interested in either of two questions. First, he may wish to evaluate the 
effect of only one experimental condition or, second, he may wish to 
contrast the changes produced under two (or more) different experimental 
conditions. 

For the first of these, a sample is selected, measurements are made prior 
to (pretest) and subsequent to (posttest) the provided condition, but, since 
changes from a first to a second measure might occur because of practice 
effect or because of some other experience beyond the control of the investi- 
gator, it is necessary to set up a control group the members of which are 
measured and then remeasured, at chronological times corresponding as 
closely as possible to those of the pretest and posttest of the experimental 
group. lt is presumed that all uncontrollable effects will be operating 
similarly on both groups so that any difference in change for the two groups 
will have resulted from whatever was done to the members of the experi- 
mental group. The statistical problem is that of evaluating the change 
shown by the experimental group compared with that shown by the 
controls, 

For the second type of question the investigator starts with two experi- 
mental groups, one of which is subjected to one experimental condition 
and the other to a second experimental condition, both groups having been 
measured prior to the experience (pretest), and then again after the 
experience (posttest). Since the question is concerned with contrasting 
gains (or losses) associated with the two conditions, a control group is not 
needed. Presumably, uncontrollable factors are alike for the two groups. 
The statistical analysis consists of testing for significance the difference 
between the changes shown by the two groups. 

Whether we are dealing with a problem calling for an experimental and 
a control group or for two experimental groups, the two groups may be 
drawn at random or formed on the basis of the pairing of individuals on 
pertinent variables. If the groups are set up on the basis of pairing, we need 
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to allow for that fact when determining the required standard error of the 
difference between changes. 

Parenthetically, it may be said that the setup which involves an experi- 
mental and a control group (or two experimental groups) for studying 
shifts has led to a great deal of confusion regarding the proper statistical 
handling of the data. We have a total of four means, for the pretest and 
the posttest for each of the two groups. By using a combination of sub- 
scripts, 1 and 2 for the pretest and posttest, and E and C to represent the 
two groups, we can specify the means as Min, Мк», Mey, and Ме». Not 
all the possible differences between these four will have meaning. Those 
that have meaning may be set forth as: 


Dj = My, — M ys, the change shown by the experimental group. 
Deo = Мо = Me, the change shown by the control group. 

4. = M, the pretest difference between the groups. 

D, = Муз — Mc» the posttest difference between the groups. 


© 
| 
x 

= 


Which of these four meaningful differences should we test for signifi- 
cance? Obviously, it is insufficient to test only Dy because we cannot be 
sure that the shift shown, even though nonchance, is really due to the 
interpolated experience. In fact, the reason for having the control group 
is to enable us to evaluate the shift which takes place as a result of causes 
other than the experimentally provided experience. Now it might be 
thought that if Dy is significant while De is less, or not at all, significant, 
an effect has been demonstrated. This type of comparison, however, does 
not provide a check on the net change. Some have argued that if D, is 
significant while D, is not, it may be safely concluded that the interpolated 
experience has had an effect. This comparison also fails to test the net 
change. We should test the significance of the difference between the two 
changes, i.e, D = Dy — D. in order to gauge properly the net shift. 
Although, as regards absolute magnitude, Dy — De will always equal 
D, — D,, it is easier to evaluate the former difference. 

To get the standard error of D (— Dy — D) when the groups have 
been independently drawn we need the sampling variance of Dj; and Do so 
as to substitute in 

Sp, =V S*p, + 8р, (6.12) 


Now since Dy = Мр = Mre is the difference between two means based 
On the same persons, we could get the standard error of Dy by using 
formula (6.8), but since the difference between correlated means is equal 
to the mean difference, Mp,, We can use formula (6.7) to get the required 
S?j,. This same situation holds for the control group, so (6.7) would also 
be used to get 52. 
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If the experimental and control groups have been formed by pairing, 
our standard error of the difference between changes will require an r 
term to enable us to take advantage of the fact that we have a better 
controlled experiment. The required r is the correlation between the 
changes shown by the members of the pairs; to compute it we need to 
consider the paired changes. We can, however, get the standard error 
of the difference by way of the algebraic difference between the changes 
shown by the members of the pairs, without computing an r. 

Let X; and X, stand for pretest and posttest scores and let the members 
of the Jth pair be designated as J and J’, with J assigned to the experi- 
mental, and J’ to the control, group. Each individual will have a 
change score which is nothing more than his pretest score minus his 


posttest score. Thus the change score for the members of the Jth pair will 
be 


C; = D; = Xy — X, and Cy = D, = Xy — Xs; 


Hence the difference between the changes (or differences) shown by the 
members of any pair will be 


D = (С, – Ci) = (D; — Dy) 
= (А, — Xa) — (Ху, — Xar) 
For N pairs we will have N Ds. These Ds are tedious to compute since 


one must preserve the same direction for each subtraction and keep track 


of signs. The process can be made somewhat simpler by removing the 
parentheses, thus 


D = X, — Xy — Xy + Ху 
Simply add X, and X, and then subtract the sum of X;; and Xj; with 


the sign for D depending on whether the first or the second of these two 
sums is the larger. 


Once the N Ds have been determined, we can get Mp, Sp, and thence 
Sa, by formula (6.7). This Mp will equal Dy — De, or (My, — M po) 
— (Ме, — Моз), and this S5,,, will be exactly the same as 


Sp, = М5, + Sp, — 2rp p S p,S De 
After the student has learned how to compute r, he may prefer to use this 
longer formula for Sp, (equivalent to 55;,) rather than go through the 
tedium of differencing differences. Regardless of how the standard error 
of the difference is obtained, the null hypothesis is tested by calculating a z, 
as the net difference between the two changes divided by its standard error. 


The foregoing procedures are also applicable when we are dealing with two 


experimental conditions. We need only to use appropriate subscripts in 
place of E and C. 
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INFERENCE: ESTIMATION 


So far we have discussed statistical inference from the point of view of 
hypothesis testing, but there are occasions when we may wish to use 
information from a sample as a basis for estimating population values. 
There are two general types of estimation: point and interval. We shall 
discuss the first briefly in order to introduce some concepts which the 
student might encounter, and the second because of its practical implica- 
tions. 

Point estimation, We may regard a sample statistic as an estimator 
for the corresponding population value (parameter). How “good” an 
estimator it is depends on whether or not it is unbiased and consistent, and 
on its relative efficiency. 

An estimator is said to be unbiased if the average of a large number of 
sample estimates tends to equal the parameter being estimated. The mean 
is unbiased because the mean of sample means will approach nearer and 
nearer и as we take more and more samples, but S? defined as Xx?[N is 
biased in that the mean of sample variances tends to be smaller than the 
population variance. An unbiased estimate is given by s? = Xa?/(N — 1), 


but for subtle mathematical reasons s, OT Ма ЈМ — 1), involves a 
negligible bias as an estimator of the population standard deviation. 
Note that the bias is small when N is large. 

An estimator is said to be consistent if it approaches nearer and nearer 
the population value as sample size is increased indefinitely. All the 
measures so far discussed satisfy this criterion. 

The efficiency of an estimator is a function of its sampling error. Thus, 
in terms of efficiency the sample mean is far better than the median as an 
estimator of the central value of a population of normally distributed 
scores even though both are unbiased and consistent estimators. 

Interval estimation: confidence interval. Interval estimation, which 
takes into account the sampling error of an estimator, provides limits, or 
an interval, for the population value, and at a prescribed level of confi- 
dence. Given a sample mean and its standard error, one could set up a 
whole series of "trial" hypothesis values for the population mean. All 
trial hypothesis values well above and below the sample mean could be 
rejected at a high level (small P) of significance, but rejection would 
become more and more risky as we approached nearer and nearer the 
Sample mean, and for a whole series of values near the sample mean all 
trial hypotheses would be acceptable. This implies that at some point 
above the sample mean and at some point below the sample mean we 
change from rejection to acceptance of the trial values. If we have adopted, 
say, the P = .05 level, the change will obviously be at M + 1.96S,,. In 
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rejecting trial values outside these limits and accepting values within these 
limits, we are in effect inferring that the population value is in an interval 
defined by these limits. 

It would seem that there should be some way of expressing our degree of 
confidence that the population mean lies between the limits M + 1.965 zr, 
since, as we have seen, we can be somewhat sure that the sample mean is 
not a chance deviation from a population mean outside the limits so 
determined. Note that, given à population mean and sigma, we can 
legitimately speak of the probability of a sample mean falling in a specified 
region, but given a sample mean we cannot speak of the probability of the 
population mean being in a certain region (or interval) for the simple and 
compelling reason that x, being definitely just one value, has no distribu- 
tion. Wecan in no way enumerate events so as to conceive of a probability 
fraction since just one event (value) is possible. 

In order to arrive at a statement which expresses our degree of confidence, 
we note that, if we draw a second sample, we would be apt to have a 
different set of limits for the simple reason that the second sample mean 
may differ from the first. If we take additional samples of the same size, 
we would have a distribution of sample means, hence a sort of distribution 
of sets or pairs of limits, since each sample mean would provide a set. Our 
discussion can be greatly simplified by taking sets of limits given by 
M + 2S,, (as approximating the M + 1.96S,, values). For simplicity 
of exposition, let us assume that we are drawing successive samples from a 
population having a mean of 10, and that the ø and N are such that oy, can 
be taken as 2. Then M + 2c,, will be M + 2(2), or M + 4. It will also 
facilitate our exposition if we think of the random sampling distribution of 
means in terms of intervals of Jo distances on the base line with the 
approximate percentage area for the several intervals, as shown in the top 
curve of Fig. 6.1. 

Now each possible sample mean will lead to a lower limit of M — 4 and 
an upper limit of M + 4. If we consider the 19 per cent of sample means 
expected between 9 and 10, we see at once that these 19 will lead to intervals 
with lower limits between 5 and 6 and upper limits between 13 and 14. 
That is, the sample means falling between 9 and 10 will generate that part 
of the lower limit (LL) curve of Fig. 6.1 between 5 and 6 and that part of 
the upper limit (UL) curve between 13 and 14. Likewise the 15 per cent 
of sample means falling between 8 and 9 will lead to the 4 to 5 part of the 
LL curve and to the 12 to 13 part of the UL curve. Similarly, as can be 
seen by careful study (a requirement for most students if understanding is 
to be achieved) of the three curves of Fig. 6.1, every left-hand segment of 
the top curve generates a left-hand segment for each of the bottom curves. 
Stated differently, the left half of the top curve leads to a distribution of 
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Fig. 6.1. Generation of confidence limits. 


intervals with lower limits less than 6 and upper limits of less than 14. In 
exactly the same fashion it can be seen that the right half of the top curve 
leads to the right half of the LL curve and also the right half of the UL 
curve, Thus we have a sampling distribution of intervals (sets of limits) as 
found by taking M + 4 (or M + 2o 3j). Our next task is to ask how many 
of these various intervals actually include 10, or the population mean. 
Reference to Fig. 6.1 will verify that, out of 100 tries, we would expect to 


get: 
4 times an interval with LL of 2 to 3 and UL of 10 to 11 


9 times an interval with LL of 3 to 4 and UL of 11 to 12 
15 times an interval with LL of 4to 5 and UL of 12 to 13 
19 times an interval with LL of 5 to 6 and UL of 13 to 14 
19 times an interval with LL of 6 to 7 and UL of 14 to 15 
15 times an interval with LL of 7 to 8 and UL of 15 to 16 

9 times an interval with LL of 8 to 9 and UL of 16 to 17 

4 times an interval with LL of 9 to 10 and UL of 17 to 18 


Notice that for every set of limits in the foregoing groups the population 
mean is in the range or interval defined by the upper and lower limits of the 
set. When we sum these expected frequencies, we see that 94 per cent of 
the sets of limits lead to intervals within which the population mean lies. If 
we had not rounded to the nearest per cent, these would sum to 95.45 per 
cent. This implies that 4.55 per cent of the time the intervals so defined 
would not include the population value. This can be verified by noting 
that sample means of less than 6 (top curve) lead to upper limits of /ess than 
10, and do so 2.27 per cent of the time, whereas sample means of more 
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than 14 produce /ower limits of more than 10 about 2.27 per cent of the 
time. These percentages are for the tails of the bottom curves, to the left 
of the ordinate at 10 for the UL curve and to the right of this ordinate for 
the LL curve. 

In summary, if we were to make in our lifetime 100 inferences concerning 
population means on the basis of sample values by each time taking the 
limits as M + 20у, the limits so established would include the population 
value about 95 per cent of the tries. That is, in the long run we would be 
correct about 95 per cent of the time in concluding that the population 
value is within the intervals so determined, and about 5 per cent of the 
time we would be in error. If we used М + 1.966, for setting limits, we 
would be correct 95 per cent, and in error 5 per cent, of the time. When we 
take M + 1.9605, as a confidence interval, the degree of faith in such limits 
is represented by a P of .95; i.e., the level of confidence for such an infer- 
ence is represented by a probability-type figure of .95. If we wish to be 
surer of our inferences, we might choose the .99 level of confidence, which 
in practice can be attained by taking M + 2.580 зу as limits. 

The limits set by the confidence interval method are so very similar to 
fiducial limits, and the level of confidence, sometimes referred to as the 
confidence coe fficient, is so much like fiducial probability that the beginning 
student can well let the mathematical statistician worry about the theoreti- 
cal difference between what seems to be two ways of doing the same thing. 

The preceding illustration of the meaning of interval estimation was 
based on a presumed known о; in practice we will have a sample estimate, 
S, hence Sy, as a basis for calculating limits. Since Sy, will vary from 
sample to sample (because of varying Ss), the width of the interval will 
vary from sample to sample and, therefore, it might be inferred that using 
M + 1.96S,, would not lead to intervals that overlap x 95 per cent of the 
time. But since the width of the interval will sometimes be too short and 
sometimes too long, there is a balancing effect for N not too small. 

Confidence intervals can be set up for statistical measures other than the 
mean, but if the random sampling distribution of a given measure is 
nonnormal, the method will not be the simple stunt of taking C + 1.965. 
or C + 2.585,. where C stands for any statistical measure. It should be 
obvious that, since the standard errors for all statistical measures are a 
function of N, it is possible by increasing the sample size to narrow the 
confidence interval without any loss in the degree of confidence with which 
we accept the limits. 

Confidence interval for a difference. There are times when it is desir- 
able not only to know whether a difference is significant but also to specify 
limits for the population difference. Such specification does not presume 
that a significant difference has been found. Even when a difference fails 
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to reach significance, the specification of confidence limits gives some idea 
of the possible difference between population values, and such information 
may help answer the nonstatistical question of whether the population 
difference is apt to be large enough to be of practical or scientific import- 
ance. This procedure may be helpful in evaluating the consequences of 
accepting the null hypothesis when in reality the hypothesis is false. 
Furthermore, the setting up of a confidence interval may be particularly 
helpful when we have obtained a difference which is highly significant. 
Consider the case of a difference of 4.78 inches in mean height between 
men and their sisters. Because of large Ns and the presence of brother- 
Sister correlation, the standard error of the difference is very small; its value 
is about .07. When we compute D/Sp we have a z of 68. This would, if 
we could evaluate it, yield a probability, for as large a difference by chance, 
Which would be so microscopically small that we could not comprehend it. 
However, when we set confidence limits at, say, the .99 level, we have 
4.78 + 2.58(.07), or 4.60 and 4.96, as limits for the population difference. 
This permits a down-to-earth way for evaluating the obtained difference. 
Level of confidence vs. level of significance. The term “level of confi- 
dence" should not, as it frequently is, be misused in place of “level of 
Significance." The first term pertains to interval estimation, the other to 


hypothesis testing. 


QUESTION OF ASSUMPTIONS 


It may be well to consider briefly the assumptions underlying the pro- 
cedures so far discussed for making statistical inferences, since assumptions 
restrict the applicability of a method. 

Independence of sampling units. lt is assumed that the conditions 
of random sampling hold, but the frequency with which the requirement of 
independence is violated by researchers suggests that a warning is needed. 
The violation usually comes about when multiple measurements or 
Observations are made on each of the individuals in a sample and each 
measurement (or response) is treated as a sample value, thereby inflating N 
n-fold times when n repeated measurements (or responses) are available for 
€ach person. The lack of independence comes about in that, for instance, 
if the sample of individuals happened to include one high scoring person 
there would automatically be high scores. The effect of such an inflation 
of N is an illegitimate reduction in standard errors. 

Infinite vs. finite universe. If we are sampling from a finite universe, 
Particularly a universe with a rather small number of cases, it seems 
Teasonable to think that as the sample size becomes large relative to the 
number of cases in the universe, the sample mean, for example, will tend 
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to fluctuate less from the universe mean than it does when we are drawing 
from an infinite population. This suggests that the standard error formulas 
need to be modified for the finite population situation. The required 
modifications are available for only a few statistical measures. If we let M 
represent the sample size and ,,, the size of the finite universe, the 
standard errors for the mean and for a proportion are approximately as 
follows: s 


Sy =e 
M JN 


JA— NIN,, and 5, = VPN Л — NIN po 

In a given research it is sometimes difficult to decide whether the universe 
being sampled is finite or infinite in size, and, if finite, it is not always easy 
to determine the value of М. It might be argued that psychologists never 
study an infinite universe. It can readily be seen, however, that the correc- 
tive factor in the sampling error formulas becomes negligible when N,,,, is 
large. Thus, if N,,, is known to be large relative to N, it matters little 
whether the given universe is wrongly conceived as being infinite. For 
example, when N is .01 of N,,,,, the corrective term leads to a reduction in 
the sampling error of about .005 of the value obtained by the ordinary 
formulas. 

These formulas for the finite universe situation are frequently useful 
when we wish to compare a subgroup with a total group which contains the 
subgroup. Such a comparison is sometimes erroneously made by taking 
м5 17, + S?,/N, as the standard error of the difference between the 
subgroup mean, M, and the total mean, M,. This makes no allowance for 
the fact that the two means are not based on independent groups. An 
appropriate procedure is to regard M, as based on a sample drawn from a 
finite universe of N, cases with mean and standard deviation of M, (as и) 
and с,; then with the standard error of M, taken as 

9, 
VN, 
we can test the significance of the deviation of M, from M, by using the 
ratio (M, — M,)/o5;,, which is interpretable as a z. This ratio will givea 
very close approximation to the z which would be obtained if we were to 
compare the subgroup with the remainder (the total cases less the subgroup) 
as two independent groups, using the usual formula for standard error of 
the difference. The foregoing scheme would also be applicable in case pro- 
portions instead of means were the descriptive measures used as a basis for 
comparison. 

Skewed distributions. The standard error formulas given in this 
chapter assume normal or nearly normal score distributions for the popula- 
tion being sampled. Skewness is the most frequently encountered evidence 


ou, = 1— NJN, 
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for nonnormality, and accordingly it is of interest to consider the effect of 
skewness on the sampling distribution of the mean, the measure most apt 
to be involved in testing hypotheses. The relationship between the degree 
of skewness, g;, for a variable and the amount of skewnesss for the sampling 
distribution of means is gy, = gv N. Thus the skewness in the distribu- 
tion of means rapidly disappears as N is taken larger and larger. For 
example, if g is .77 (see Fig. 3.1, p. 27) and N is 35, the skewness for the 
sampling distribution of means will be only .13 (see Fig. 3.1 again). 
Accordingly, the procedures in this chapter may be safely used with 
moderately skewed distributions when № is large and with markedly 
skewed distributions when N is very large. Some methods for handling 
nonnormal data will be discussed in Chapter 19. 


A FURTHER WORD ON PROPORTIONS 


The student will have noted that the general principles of statistical 
inference set forth in Chapter 5 have been utilized and extended in the 
present chapter. There are many points of obvious similarity in the two 
chapters, but there is an additional parallelism which is not obvious. For 
an attribute involving a dichotomy such as yes-no, like-dislike, pass-fail, 
etc., we may arbitrarily assign a score of | to one category and a score of 0 
to the other. That is, ¥ = 0 or 1. 


Tabie 6.2. Scheme for mean and standard deviation of a dichotomous variable 


Response Xx f fX fx? 
Yes 1 A RO har 
No 0 fo fO) А00)? 
Sums N RO ЛО) 

= УХ = DX? 
- f =f 


Let f, and f, stand for the frequency of, say, no and yes responses 
Tespectively in a sample of N cases. Thus we have a miniature frequency 
distribution, with the two categories being analogous to two intervals. 
Let us consider the mean and standard deviation of this miniature frequency 
distribution, both in terms of gross score formulas. Notice that in Table 
6.2 we have a score column, XX, a frequency column, f, an fX and an fX? 
column (analogous to fd and fd?, with d= X). It willbeseenthat LY =; 
hence the mean of the distribution is М = EX/N = fj/N = p, where p is 
the Proportion of yeses. Hence a proportion may be regarded as a mean. 
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It will also be seen that XX? = fi; hence when we utilize formula (3.7) 
to write the variance of the distribution we have 


5° = <j [NEX* — EX] 
1 


=T INA — 0] 


= (р = р?) = р(1 — р) = pq 


Непсе 5 = Ур as the standard deviation of the dichotomous distribution. 
(Any connection with the binomial?) TN 

In this chapter we have given Sy, = S/V N as the standard error of a 
mean. If this holds for the dichotomous distribution we would have 
Sy = УРУМ = V pq/N. But this is the same as 5, given by formula 
(5.2). This is as it should be since p = M for the dichotomous distribution. 

Furthermore, formula (5.5) for the standard error of the difference 
between correlated proportions has its analogue in the development on 
pp. 80-82 for the difference between correlated means, and formula (5.6) 
involves a pattern similar to that of formula (6.9). 


NOTE ON THE PROBABLE ERROR 


An antiquated procedure is the use of the probable error, pe, instead of 
the standard error in connection with sampling. The pe of the mean is 
.6745S y, and therefore we would expect 50 per cent of successive sample 
means to fall between и + реу. Similarly, the pe for any other statistical 
measure is .6745 times its standard error. The student who attempts to 
survey the research literature on a given topic is apt to encounter pes and 
he therefore must know the relationship of the pe to the standard error. 


NOTE ON NOTATION 


We have used the Greek letter и as the symbol for population mean and 
the corresponding Latin letter M for a sample mean. Another frequently 
used symbol for a sample mean is X (read X bar); later in this text we will 
use the bar to indicate a sample mean. The student needs to know both M 
and X as symbols. We have used c as a symbol for the standard deviation 
of a population and also for the standard deviation of a theoretical distri- 
bution, such as the binomial or (the definition formula of) the normal 
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curve; the Latin equivalents, S and s, stand for sample standard deviation, 
one biased the other unbiased. As shall be seen, we need in the sequel 
both S and s. Consistency in notation would call for p (or P) as a sample 
proportion and the corresponding Greek ~ as a population value, but the 
use of « was long ago taken by mathematicians as the symbol for something 
else, so to avoid confusion we used p,,, instead of m. Later we will use r 
and r,,, as symbols for sample and population correlation coefficients 
because p (rho) has, as we shall see, been used to signify a particular kind of 
correlation coefficient. 


Chapter 7 


SMALL SAMPLE OR 
t TECHNIQUE 


Although the general principles of statistical inference are the same for 
both large and small samples, the techniques differ. We shall confine our 
attention in this chapter to the technique for dealing with a single mean and 
with the difference between two means. Chapter 14 will deal with infer- 
ences concerning variabilities. 

It will be recalled that the sampling distribution of the mean is normal 
when the trait distribution is normal. This holds regardless of sample size. 
The sampling distribution of means centers at the population mean with a 
true standard deviation суу = o/V N, which sigma we termed the true 
Standard error of the mean. Recall also that the relative deviates, 
(M — џ)/о y, follow the unit normal curve. When successive samples are 
drawn and a S;, is computed for each sample by using the sample S 
instead of o (an unknown), the ratios of given (M — u)s to their S, 
values so computed will be distributed normally for very large Ns and ap- 
proximately so for Ns of moderate size, but for Ns as small as 30 the 
approximation is none too good. The value 30 is arbitrarily chosen— 
the approximation to normality becomes progressively worse as we go from 
large to small Ns rather than becoming abruptly worse in the vicinity of 
N = 30. 

We have already mentioned the fact that S2 = Ya?|N suffers from bias, 
whereas s? = Xa*/(N — 1) is an unbiased estimator of the population 
variance. Since the bias in S increases with a decrease in N, itis important 
to use the unbiased estimator when N is small. We will accordingly use 
5м = s| V N, in place of Sy = SIV N, asa nonnegligible improvement in 
the estimate of the standard error of a mean based on a small sample. 
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Even so, the successive sample ratios, (M — и)/5 у, with Say computed 
from each sample, will not follow the unit normal curve because the 
sampling distribution of s (also S) is skewed for N small; hence the distri- 
bution of successive values of sy, will be skewed. That is, the successive 
sample values of (M — y)/sy, will involve a variable numerator which is 
normally distributed and a variable denominator which has a skewed 
distribution. The distribution of the resulting ratios will be symmetrical 
about zero but will be leptokurtic. That is, it is characteristic of the 
sampling distribution of (M — )/s3, that the tails of the curve beyond a 
ratio of about 1.7 tend to be higher than the tails of the normal curve. 
Thus, there will be relatively more large ratios. 

The т distribution. It can be shown that such ratios, involving a 
normally distributed deviate divided by an unbiased estimate of its sam- 
pling error, will follow the so-called ¢ distribution, defined by 


n+1 
r( 2 ) | з 
1+ 


in which Г indicates the gamma function as defined in texts in advanced 
calculus. Although this equation will be beyond the mathematical 
comprehension of most students, it should be noted that y is the height of a 
curve, that since / is squared the distribution is symmetrical, and that the 
equation contains ап n as yet undefined. This п has to do with the number 
of degrees of freedom, a concept which is to be discussed. Suffice it to say 
just now that л will be a function of sample size (or sizes) and accordingly 
that there will be not just one but many distributions of г, one for each 
possible value of n. 

Figure 7.1 shows the curve of г, when n = 7 and when n — 3, as com- 
pared to the normal curve. For » larger and larger, the curve of t 
approaches that of the normal distribution. Table E of the Appendix gives 
the values of г, for ns of 1 to 30, which will be exceeded by chancea specified 
Proportion of times. Thus for” = 30 we see from Table E that the P = .05 
point is at a ¢ of 2.04 as compared (о a normal deviate of 1.96. Forn = 10, 
the point corresponding to the .05 level is / = 2.23. The .01 level is at 
t = 2.75 for n = 30, and at 3.17 for n = 10, as compared with 2.58 for 
the normal curve. 

Degrees of freedom. The п of the equation for г, and in the ¢ table, 
is the number of degrees of freedom (df) involved in the estimate of the 
Population variance. The df depends on how many of the zs in Xa?, or 
Xx — му, are “free to vary." Suppose two scores, 3 and 5. Their 
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-4 -3 =2 =i 0 +1 +2 +3 +4 


Fig. 7.1. Normal compared with ¢ distribution for n = 3 and n = 7. 


mean is 4, and the sum of squares (of deviations) is (3 — 4)? + (5 — 4)? 
= 2. Now 2s = У(Х — M) = EX — EM = ХХ — NM = EX - N29. 
= 0, always. Therefore, as soon as one of two deviations is known, the 
other z is determinable. Thus, if z, is —1, the other deviation, xy, must 
satisfy the equation —1 + 2, = 0. One deviation and hence its square can 
be thought of as dependent on the other deviation, which has some inde- 
pendence, and therefore 1 degree of freedom. Suppose that we have three 
scores, 3, 4, and X, which yield a mean of 4. The deviations must satisfy 
the requisite that they sum to zero; i.e., (3 — 4) +(4—4) + (X — 4) 
— 0. Thus one of the three deviations is fixed by the other two, i.e., is not 
independent of their values, because the three deviations must sum to zero. 

It may be more enlightening to start with symbols for scores. Suppose 
that Xy, X,, Хз, and X, represent four scores, and it is reported that their 
mean equals 40. How many of the four deviations can we assign at will? 
Stated in deviation units, we have (X, — 40) + (X, — 40) + (X4 — 40) 
+ (X, — 40) as a sum which must equal zero. It is readily apparent that 
only three deviations can “vary freely"— the fourth is fixed by the numeri- 
cal values of the other three. Hence dí—-4-—1;ie,l degree of freedom 
in the deviations or their squares is lost because of the one restriction 
imposed. Actually, this restriction comes about because we are taking 
deviations about one constant, the mean, computed from the set of scores 
at hand. The df for a sum of squares (of deviations) about a mean is 
always N — 1 when N scores are used to compute the mean. In general, 
the df for the sum of squares is equal to the number of squares minus 
the number of restrictions imposed by constants computed from the 
data. 


[7] SMALL SAMPLE OR / TECHNIQUE 101 


Note that the unbiased estimate of the population variance, s? 
= YXa?/(N — 1) involves dividing by df. the number of degrees of freedom. 
This is a general rule. 

Computation of s? or s. For N small the mean and 5? or s are readily 
computed from gross score formulas. Thus M — XX|N. To compute s? 
ог s we need Xa? in terms of gross scores. This was given earlier as 


xi = x [NEx? — (Zxy] (3.6) 
Division of this by № — 1 yields 5°, the square root of which is the required 


; Р "ГИ" N 
s. An easily derived relationship between 5" and 5° is 5 = үт ЫЕ 


Although we do not need a frequency distribution for purpose of compu- 
tations, a distribution should be made anyway so as to permit at least a 
rough check on the assumption that the scores have been drawn from a 
normally distributed population of scores. 

t for a single mean. We can test the significance of M as a deviation 
from any hypothesized value for the mean, M, by taking t = (M — Mills, 
as an entry in Table E, with n = df= N — 1, to see whether the obtained 
t reaches the ¢ value required for certain levels of significance. If the t does 
not reach the value required for the chosen level of significance, the devia- 
tion would be attributed to chance and the hypothesis accepted. 

If we wish to specify the confidence limits for the unknown population 
mean and to do so with a level of confidence indicated by P — .99, we first 
note from the table of t how large must be, for the given df, to correspond 
to the .01 probability level. Then M plus and minus the г, so found, 
times s у will give the desired limits. For example, suppose nine cases yield 
a mean of 80 and a sum of squares of 1152. Dividing the sum of squares by 
df, or 8, we get s? = 144, s = 12asan estimate of c and sy, = 12/V9 = 4. 
For 8 df we find from Table E that ¢ = 3.355 for the .01 level. Then 
80 + (3.355)(4) gives 66.58 and 93.42 as the .99 confidence limits for the 
population mean. If we used the large sample method of Chapter 6 we 
would have S? — 1152/9, giving S as 11.31, from which we would get S}; 
= 11.31/М 9 = 3.77. Since for the normal distribution a relative deviate 
of 2.575 corresponds to the .01 level, we have 80 + (2.575)(3.77) or 70.29 
and 89.71 as the .99 confidence limits for the universe mean. These values 
for the confidence interval differ appreciably from those obtained pre- 
viously when proper allowance was made for the smallness of the sample. 

Difference between correlated means. It will be recalled that when 
we have two means based on the same individuals or on paired cases, the 
test of significance of the difference must make allowance for the fact that 
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the two sets of scores are not random with respect to each other. In 
Chapter 6 we saw that this could be done by including the r term in the 
standard error of the difference, as in formula (6.8). or by wo-king directly 
with the differences between paired scores. It was shown that Mj, = Dy; 
and that 5 уу, = Sp, When we have small samples, it is easier to work 
with Mp, an estimate of the o of the distribution of differences between 
paired scores, and thence 5м, To get the best estimate of the sampling 
error of Mp, we need the sum of squares of the deviations of the pair 
differences from the mean difference, i.e., X(D — Mp. which when 
divided by the proper df. or N — 1, where N is the number of differences 
or the number of paired scores, gives the best estimate of the variance of 
the universe distribution of differences. Let s? stand for this estimate. 
Then 


su, > (7.1) 


The computation is straightforward. Each of the Ds is the difference 
between two scores, the subtraction being made in the same direction for 
all, and the sum of squares, X(D — Mj, is obtained by formula (3.6) 


1 
with the Xs replaced by Ds; that is X(D — Mj = yxp — (X Dy]. 


The Ds are summed algebraically, and their squares are summed. After 
55, has been calculated, we get as M зл. The hypothesis to be tested is 
that the universe value of M, is zero; the table of / is entered with the 
obtained гапа with df = N — 1 in order to see whether it reaches a pre- 
scribed level of significance. Note that the df is | less than the number of 
Ds, not | less than the total number of scores (see “Further note" on dfs, 
p. 104). 

The assumption of normality pertains to the Ds: hence, again, even 
though a frequency distribution is not needed for computational purposes, 
it should be made so as to provide a rough check on the assumption. A 
confidence interval for M; (and consequently D у) can beset upin precisely 
the same manner as indicated previously for a single mean. 

Difference between independent means. Given: two groups of N, 
and N, cases, and that we wish to test the significance of the difference, 
Юм = M; — Ms. By the procedure of Chapter 6 for large Ns, we would 
make the necessary calculations for determining D3/Sp,, от. Asan aid 
to transition in thought from z to гї, let us first write the expression for z, 
thus, 

Duy М-М, М-М, 
Sp, VS ar, TS {= т 5°, 
N, Ns 
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which involves the two sample variances. Now, for the small sample 
situation, weneed г = D5,/Sp,, where sp, is to be the best possible estimate 
of the standard error of the difference. To get this we apparently need the 
best possible estimates of the two variances of the two populations from 
which the samples have been drawn. But here we encounter an assumption 
underlying z for this situation: the two populations must have the same 
variance. Hence, we need just one estimate, an estimate of the variance 
common to the two populations. Calling this estimate 5°, by analogy with 
the z technique, we need 


The best estimate, 5°, of the common population variance is obtained by 
computing the sum of squares separately for the two samples, then com- 
bining these sums, and dividing by the proper df, or 
XQ — My? + XQ — Ма) qa 

N, + N.—2 


The two separate sums are computed by formula (3.6). Note that 2 
degrees of freedom are lost because the sum of squares is about two means, 
which leads to two restrictions. Substitution of the obtained s? in the 
foregoing expression leads to a t, which is looked up in Table E with df, 
or n, equal to № + № — 2 in order to see whether it reaches a chosen level 
of significance. 

There is one point in the method of determining the 5°, needed for testing 
the significance of the difference between means, which may have puzzled 
the student. The setting of the null hypothesis, in combination with the 
assumption of equal population variances, implies that the two Samples 
have been drawn from a single universe or from two universes which have 
the same mean and equal variances, for the given and measured trait. It 
might accordingly be assumed that the best estimate of the population 
variance would be obtained by taking the sum of squares about the com- 
bined mean rather than about the separate means. The former would give 


the better estimate of the variance if it were actually known that the two 
r that only one universe was involved), 


at the two universe means really differ. 
m of squares about the combined mean 
would, in general, yield too large an 52 for the simple reason that the real 
difference between groups would be contributing to the variability of the 
two groups combined. (The student who has difficulty seeing this point 
should imagine what would happen to the variance of scores when two 


2 


5 


Universe means were the same (0 
but there is always the possibility th 
If they do differ, the taking of the su 
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groups markedly different in means were combined.) It follows, therefore, 
that in the long run the best value for s? will be provided by summing the 
sums of squares about the two means. 

The procedure for setting a confidence interval when we have indepen- 
dent means is no different from that for correlated means. Simply take 
Р” у + t,5p,, where t, is the г, for the given df, required for significance at 
the P — «level. This will give limits for the P — 1 — «level of confidence. 
Suppose we wish the .99 confidence interval; this requires an « of .01, or as 
sometimes written, t, = fọ, where to; is found under the P = .01 column, 
opposite the df. 

Further note on degrees of freedom. Suppose two independent groups 
with N, = № = М, and also two groups of scores based on N cases (or N 
paired persons) For the former the df is № + N, — 2 = 2N — 2, 
whereas for the latter the df is N — 1 even though in the paired situation 
the total number of persons is 2N. This may be (and has been) confusing 
to Some; it seems as though the obviously better plan (matching) leads to a 
loss in df compared to the setup involving independent groups. It is 
sometimes argued that the df would perhaps be larger if we worked not 
with the difference scores but with the two sets of scores in terms of the 
sums of squares of deviations for each set and the sum of cross products 
since, as can be seen from p. 81, 


X(D — Mp)? = Хз? + Xx, — 2Xxqe, 


The df for the left-hand sum of squares is obviously N — 1, and since the 
right-hand side of the equation is merely an algebraic variant of the 
left-hand side, it does not seem reasonable to believe that the dfs will differ 
for the two sides. Note that if we consider Xa?, as having N — 1 degrees of 
freedom, we cannot have any more degrees of freedom for the other sums 
on the right side because the x, values are not independent of the z, values; 
they (the x, scores) are not “free to vary." 

Comparison of changes. In Chapter 6 (p. 86) we discussed the pro- 
cedures for testing the differences between changes shown by two groups. 
For the situation involving paired persons, a D for the difference between 
changes for the members of a pair was defined (p. 88), and the test of 
significance involved computing, for Ds so defined, an M p, Sp, and thence 
Sm, For the small sample, or г, technique we need sp and 5,» Just as 
given previously for correlated means. The df is | less than the number of 
pairs. For the setup involving the changes for independent groups, we 
would need an sp, instead of the Sp, of (6.10). The required Sp, is given by 


[a 2 

$ S 

Spy = ZD qu CUD) 
Ng Ne 
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in which 
» _ XD- My + XD — Мр)? 


фы 
& № + No 2 


with the subscripts E and C referring to experimental and control groups. 
Thus, the procedure for testing hypotheses involving changes for two 
groups is precisely the same as that for testing the difference between two 
independent means, discussed previously —X is replaced by D, a difference 
Score. 

One-tailed versus two-tailed test. Our discussion of the / technique so 
far has been in terms of the г value needed for a two-tailed test at a given 
level of significance. If the hypothesis to be tested or the decision to be 
made logically warrants a one-tailed test, the ¢ required for significance at 
the .01 level would be found under the .02 column of Table E, and for the 
.05 level the .10 column would be used. Those who do not wish to be 
restricted to the P levels given in Table E will find for dfs up to 20 the P 
associated with any t in Table XLV of Peters and Van Voorhis' Statistical 
procedures and their mathematical bases. This table gives one-tailed values, 
which need, of course, to be doubled for two-tailed tests. 

Question of assumptions. When we use the tabled values of the / 
distribution as a basis for judging significance or for setting confidence 
limits, we are in effect presuming that some quantity, usually a ratio such 
as (M — M;j)|sar or M p[ssr, or (Mı — M,)/sp,, Will in the sampling sense 
follow the 1 distribution. The mathematical proof thereof is based on 
certain assumptions: normality for the population of X scores and of D 
scores for the first two ratios, and normality of Xs for both populations 
With common, or equal, variances for the third ratio. Whether or not 
these assumptions hold will usually be unknown. — 

It might be thought that the assumption of normality underlying the use 
of t could be tested on the basis of the sample (or samples) at hand either 
by testing the departure of gy (skewness) and gs (kurtosis) from zero (or by 
a chi square technique, discussed in Chapter 13), but these methods of 
testing for normality are not sensitive enough to lead us to reject, on the 
basis of a small sample, the hypothesis of normality unless the departure 
therefrom is very marked. Likewise, the as yet undiscussed test (see 
Chapter 14) for a possible difference between variances is too insensitive 
When used with small samples to lead to rejection of the hypothesis of 
equal variances unless the difference between the two universe variances 15 
Sizable; hence it is difficult to be sure that the assumption of equality of 
Variances is tenable when two groups are being compared by the f tech- 
nique. The foregoing statements are, of course, based on the proposition 
that by statistical methods it can be proved, at a desired level of significance, 
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that a sample distribution did ло! arise from a normally distributed universe 
or that two universe values are different, but such methods will not prove 
normality nor prove that two universe values are identical. 

Since it is difficult to be sure that the assumptions will hold for a given 
batch of data, the question may be raised as to the effect of violations of the 
assumptions. Will too many ог too few calculated zs reach the tabled 
value for the .05 or the .01 levels of significance? Or stated differently, does 
the chosen level of significance actually represent the probability of 
making the type I error? Over the years there have accumulated both 
mathematical deductions and empirical evidence indicating that the г test is 
“robust” under violation of assumptions; that is, calculated zs tend to 
follow closely the г distribution. There are exceptions to this rule, as is 
shown by the recent empirical study by Boneau.* 

Boneau, with the indispensable help of an electronic computer, calcu- 
lated 1000 zs for the difference between independent means for each of 20 
different combinations of conditions with regard to Ns, shapes of distribu- 
tions in the “‘universes,” and equality or inequality of universe variances. 
The percentage of the ts reaching the .05 and the .01 levels is indicative of 
the disruption produced by specified violations of assumptions. 

First, differences in variances (øs of 1 and 2, or one population variance 
four times that of the other; both distributions normal) for Ns very small, 
5 and 5, produced about 1 per cent too many ts at the .05 and the .01 levels, 
but for Ns of 15 and 15 the discrepancies were only one tenth of a per cent. 
With samples of size 5 from the universe having the smaller variance and 15 
from the universe having the larger variance, too few reached the .05 and 
the .01 levels—the .05 level being reached only .01 of the times and the .01 
level only .001 of the trials. But when 15 cases and 5 cases were drawn, 
respectively, from the universes having the smaller and larger variances, far 
too many calculated ts reached "'significance"—16 per cent at the 5 per 
cent level and 6 per cent at the 1 per cent level. The moral is clear: if we 
suspect that the variances may be unequal, we should make the two sample 
sizes equal or nearly so. Presumably, the disruption of the t test will 
depend on the relative magnitude of the two universe variances—the 
larger the variance difference, the greater the disruption. In psychological 
research, when sample sizes are large enough to permit any firm statement 
about a difference between os, it is rarely possible to conclude that the с 
for one population is twice that of the other, the ratio of the os in the 
Boneau study. 

Second, when sampling from platykurtic (actually rectangular shaped) 
distributions Boneau found negligible effects, but when sampling from 


* C. A. Boneau. The effects of violations of assum: 


ptions underlying the ¢ test. Psychol. 
Bull., 1960, 57, 49-64. 
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markedly skewed (J-shaped: g, = 2.0) distributions, Ns equal, he found 
that 3 and 4 per cent of the ss reached the 5 per cent level and too few 
reached the | per cent level. It is comforting to know that such extreme 
Skewness, rarely encountered in practice, will not lead to too many 
significant rs. 

Third, although the foregoing results hold for both one- and two-tail 
tests, Boneau found that when one sample was from a J-shaped distribution 
and the other from either a normal or a rectangular distribution, the 
distributions of the resulting calculated rs were skewed: a doubling of the 
risk of falsely concluding that the mean of the J-distribution is lower than 
that for the rectangular, also normal, distribution; conversely, “significant” 
differences in the opposite direction occurred only half as often as expected 
from the theoretical / curve. These results should give pause to the advo- 
cates of one-tail tests; they also have obvious implications for two-tail 
tests even though the number of fs, irrespective of direction, exceeded only 


slightly the expected number. 
Suppose that in one study the difference between two means for two 


small samples leads to a г which falls at the .01 level and that in another 
study two large samples yield means, for another trait, which are also 
Significantly different at the .01 level. Can we place as much reliance on the 
first difference as on the second? The answer is yes, provided the two 
studies have been carried out with the same degree of care as regards 
controls and adequate sampling techniques, and provided it is safe to 
presume that the fundamental assumptions underlying / are tenable. Thus 
Our confidence in a result based on small samples is a function not only of 
the probability level of significance attained but also of our faith that 
assumptions have been met. Since, as we have suggested, the conditions 
of trait normality and equality of variances are exceedingly difficult to 
demonstrate when the only information available is based on the small 
Samples at hand, we are forced to conclude that, in general, we cannot place 
as much reliance on the results from small samples as on those from large 
Samples. 

Although this last statement can, in light of Boneau's results, be 
qualified, we still have the question of the place of small samples in psycho- 
logical research, and about this there will be a diversity of opinion. We 
do not propose to settle the issue or even debate it; instead, we shall 
mention a few points which we feel are pertinent. There are, of course, 
types of research for which it is impossible or practically impossible to 
Secure more than a few cases, either because of their scarcity or because of 
Prohibitive costs. For such situations it is fortunate that the small sample 
От t technique, which permits some allowance for the smallness of the 
Sample or samples, is available. Quite frequently small samples may be 
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useful in a preliminary study carried out solely for the purpose of guiding 
the experimenter. If given hypotheses seem to be verified, the next step 
should be to secure more cases for further verification rather than to rush 
into print with positive conclusions. 

It seems to the writer that those who publish statistical results based on a 
small number of cases should, unless they are positively sure that the basic 
assumptions underlying ¢ have been met (and this assurance can seldom be 
attained), adopt a more stringent level of significance than they would 
adopt if they had large samples. Admittedly, a more stringent criterion of 
significance means that the null hypothesis may be less frequently rejected 
and consequently that a real difference may be overlooked. At this point 
some readers may need to be reminded that the best way to avoid com- 
mitting type II errors is to avoid the use of small samples: the greater the 
number of cases the greater the likelihood of detecting a difference. 

An illustration of the fact that small samples are not conducive to 
rejection of the null hypothesis unless the difference between universe 
values is sizable may be in order. Let us suppose that the means for the 
heights of two populations are 64.5 and 68.0 and that the universe standard 
deviations are both equal to 2.7. An investigator who does not know these 
facts draws a random sample of eight cases from each universe; and in 
order to help him a little (and also simplify this discussion), we tell him that 
each с = 2.7. The standard error of the difference between means becomes 
2.7V} + dor 1.35. If the investigator accepts the .01 level of significance, 
it is immediately apparent that an obtained difference would have to be 
at least (2.58)(1.35), or 3.48, for him to reject the null hypothesis. (Why are 
we justified in using the normal deviate, 2.58, with such small samples?) 
A little consideration of the fact that the sampling distribution of differences 
between means will center at 3.5 indicates that the chances are nearly 
50-50 that the investigator will be accepting the null hypothesis even 
though the real difference is more than a standard deviation in magnitude. 

There are times when an investigator may be so anxious to accept the 
null hypothesis that he will seize upon a very high level of significance in 
order to better his chances for accepting the hypothesis of no difference. 
Another way for increasing the odds in favor of accepting the null hypo- 
thesis is to use exceedingly small samples. Now those who desire to claim 
that no difference exists must face the simple fact that such a proposition 
can never be proved on a sampling basis. The most convincing way to 
demonstrate that a difference is of no practical or scientific importance is 
to use large samples and the confidence interval method for specifying 
limits for the population difference. 


Chapter 8 


CORRELATION: INTRODUCTION 
AND COMPUTATION 


One of the chief tasks of a science is the analysis of the interrelations of 
the variables with which it deals. In the physical sciences, and frequently 
in the biological sciences, the interrelations can be determined by noting 
how much of a change in one variable is associated with change in another. 
The physicist studying the relationship between temperature and pressure 

e former at will so as to determine the pressure 


exerted by a gas can vary th 
at different temperatures. In the social sciences, and sometimes in the 


biological sciences, the variables studied are apt to be characteristics of 
individuals (plant or animal); thus to study relationships the experimenter 
is compelled to make measurements on several individuals. For example, 
if two variables such as height and weight are under consideration, the 
measured height and weight of N individuals will provide N pairs of 
Observations from which it can be determined whether the two vary to- 
gether. In either case it is important to determine the form (mathematical) 
of the relationship and the accuracy with which it is possible to make 
predictions. х 

Many relationships аге expressible in terms of the simplest of all 
mathematical forms, Y = A + BX, in which X and Y represent variables 
and А and B are constants determinable from the observations. The 
accuracy of prediction can be determined, and it is convenient that we have 
Some general measure of this accuracy. One such measure which can be 
computed and which will yield information as to the degree of accuracy and 
the degree of relationship is the correlation coefficient, designated r. This 
measure of co-relation, as we shall soon see, not only tells us the degree of 
relationship, but will also, in conjunction with the two means and standard 
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deviations, permit us to write the linear equation for predicting Y from X 


or X from Y. 

Our present discussion will be concerned with the determination of 
relationship between such typical variables as height, weight, strength, age, 
intelligence, social status, attitudes—i.e., with those variables which show 
variation from individual to individual. The question of the relationship 
between variables of this type can be stated quite simply: Is there a ten- 
dency for the individual who ranks high (or low) on one characteristic to be 
high (or low) on another also? It should be noted that at times a relation- 
ship may involve just one variable: Are heights of sons related to the 


heights of theirfathers? Are the IQs of adults related to their childhood 
IQs? 


THE SCATTER DIAGRAM 


The first task is that of tabulation. If we have observations on the height 
and weight of a large number of individuals, using cross-sectional or 
coordinate paper, we can lay off on the y axis convenient tabulating 
intervals for, say, height and on the = axis intervals for weight. The rules 
for choosing intervals stated on p. 6 should be followed here. Tabulation 
then consists first of finding on the y axis the interval in which an individ- 
ual's height falls and locating the interval on the x axis for his weight. A 
tally or dot is then placed in the се// formed by the intersection of these two 
intervals. The result of such a two-way or cross tabulation is referred to as 
a scatter diagram or correlation table. It will contain as many tallies as 
there are pairs of observations. The tallies in each row, or horizontal 
array, can be counted and recorded, separately by rows, to the right of the 
diagram. This procedure will, of course, yield the frequency distribution 
for all individuals with respect to the variable on the y axis. A similar 
count, and recording at the top, of tallies for each column, or vertical 
array, will yield the distribution for the other variable. The sum of the 
frequencies for either of these marginal distributions should equal N, or 
the number of pairs of observations. 

Figures 8.1 and 8.2 are illustrative scatter diagrams, but not models so 
far as number of grouping intervals is concerned. In practice, from 12 to 
20 intervals should be used in order to reduce the grouping error to a 
negligible amount. It is to be understood that the intervals in these charts 
are 40-44, 30-39, 50-59, etc. The student should study these diagrams so 
as to grasp some of the mechanical details involved in their construction. 
It should be noted that the number and size of the intervals for the two 
variables need not be the same, and that the zero points on the scales of 
measurement need not appear or even be indicated on the axes. 
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Fig. 8.1. Correlation scatter diagram for two tests. 


It can readily be seen that these two diagrams represent different 
degrees of relationship. A precise method for measuring or describing 


degree of relationship or associa! 


tion or correlation will be discussed in 


detail in the pages to follow. We shall begin with a symbolic definition of a 


basic correlation coefficient, indicate its СО: 
meaning, interpretation, assumptions, and 


140 


mputation, and then discuss its 
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Fig. 8.2. Correlation scatter for two forms of Stanford-Binet. 
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elementary mathematical derivations will be either indicated or given 
whenever it is thought that their inclusion will be useful in clarifying a 
point or clinching an assumption. 

The Pearson product moment correlation coefficient is defined by 


Lay 


те 8.1 
"= м,5, en 


in which x and y represent deviation measures from the respective means 
of the two variables, i.e., x = Y — M,andy= Y— M,, the Ss in the 
denominator are the standard deviations of the two distributions, and N is 
the number of individuals measured. With reference to a scatter diagram, 
M, and S, hold for the marginal distribution at the top, whereas M, and S, 
hold for the distribution to the right. The numerator term, Уху, implies 
that the product of each individual's x and y is determined, and that all 
such products are summed algebraically. There will, of course, be N 
products in this sum, some of which will be positive, some negative, and 
perhaps some zero. 
Definition formula (8.1) is seldom used for computation. For N small 
a usable computational equivalent is 
а NXXY- XXXY (8.2) 
VNZX*— (EX) NZY? — (Хү) 


which involves four familiar sums, and the sum of the products of the 
paired raw scores. This formula is unwieldy for large N and/or scores 
Which are numerically large. For reasons which will become apparent 
later, the careful researcher will always make a scatter diagram, and once 
this has been done it is economical to compute r in terms of step-interval 
deviations from arbitrary origins. An appropriate formula is 


NYd,d, — Ха,Ха, 
V NXa*, — (Xa | NXa*, — (Ed, 


r= 


(8.3) 


in which d, is defined as an individual's score deviation, in step intervals, 
from an arbitrary origin on the X scale, and d, is defined similarly for the 
Y scale. The student will note the similarity of the radical terms to 
formula (3.5) for computing S. Formula (8.3) calls for two sums, two 
sums of squares, and a sum of cross products, all in terms of step or interval 
deviations from arbitrary origins. The arbitrary origins may be taken at 
the center or at the bottom of each distribution. The former will involve 
handling smaller figures but will have the disadvantage of introducing 
negative numbers. The latter scheme is better if a calculating machine is 
available. 
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CALCULATION OF r 


The computation of r will be illustrated for both hand and machine 
calculating methods. The hand calculation scheme here used may not be 
quite as economical as other available schemes, but the particular setup 
has the advantage that it forms an economical basis for machine com- 
putation, and the author presumes that practically all those who are apt 
to compute more than a few rs will have access to a calculating machine of 
the Monroe or Marchant or Friden type. Once the steps involved in the 
hand calculation form are grasped, it becomes easy to transfer them to 
machine work. The writer has never found the commercial correlation 
charts helpful. All that is necessary is a sheet of cross-section paper ruled 
four lines to the inch, on which we can readily lay out the axes, in intervals, 
for tabulating or tallying. When the scatter diagram has been made and 
the tally (or dot) marks have been summed across and up to get the mar- 
ginal frequencies (as shown in Figs. 8.1 and 8.2), the d values, taken from 
an arbitrary origin at the bottom-most interval for each variable, can be 
written, preferably with colored lead, alongside the marginal frequencies 
(see Table 8.1). The columns of fd and fd? values along each margin can 
be obtained by multiplying in exactly the same manner as was previously 
done for calculating the standard deviation. The sums of these columns 
provide four of the five sums needed for r. 

In order to obtain Xd,d,, each individual's d, must be multiplied by his 
d,, and all such products then summed. In the 140 interval on the y axis 
we find one individual whose score on the X variable falls in the 50 interval 
on the x axis. In terms of step deviations his d, value is 8 and his d, value 
is 5, and therefore 5 times 8, or 40, represents his d,d, product. Another 
individual with the same d, value has a d; value of 6, whence 6 times 8 is his 
contribution to Ed,d,. The third individual in the 140 interval has a d, 
value of 7, whence 7 times 8 is his product. These three individuals 
contribute 5 x 8 + 6 x 8+7 x 8, or 144, to the sum of products. The 
d, value of 8 is a common factor to these three products, whence 
8(5 + 6 + 7) or8 x 18 yields 144. This suggests a scheme, for computing 
the d,d, sum, which involves first summing the d, values for a particular Y 
interval or array and then multiplying this sum by the d, value. Thus the 
d, values of the individuals in the 130 interval sum to 34, and in the 120 
interval to 34, and so on down to the 60 interval, which yields 2 as the sum 
of the d, values. The determination of these d, sums is greatly facilitated 
by the use of a runner on which the d, values 0, 1, 2, 3,---, have been 
labeled to correspond exactly with the deviations in step intervals alongside 
the marginal distribution at the top of the diagram. Since each of these 
d, sums is to be multiplied by a d, value and then all the products summed, 
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Table 8.1.* Computation of r 


35 |40 45 |50 55 |60| 61 253 1297 224 1097 


x N Ed, Sd2, Ck Edd, 


(61)(1097) — (224)(253) M 
V(61)(1012) — (224? V(61)1297) — (253 ` 


* Space limitations account for the use of too few intervals in this table. A complete 
labeling of intervals would be 25-29, etc., and 60-69, etc. 


it is convenient first to record the d, sums to the right as a separate column 
and then to multiply each d, sum by the corresponding d, value, thus 
leading to the last column of figures. Before these final multiplications are 
made, the column of d, sums should be added to see whether it agrees with 
the Xd, already computed from the marginal distribution of X scores. 
Thus an internal check is provided for the column of d, sums; all other 
computations should be done twice in order to insure accuracy. 

When a calculator is available, the work sheet need not include the fd 
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and fd? columns, since the sums of these two columns can readily be 
obtained by the method discussed on pp. 22-23. This means that the 
column of d, sums can be placed alongside the d, values; then each d, 
sum can be multiplied by the juxtaposed d, value, with the products 
allowed to accumulate in the dial as the needed Ed,d,. Thus the right-hand 
column figures need not appear on the work sheet. 

The substitution of the five sums into formula (8.3) is straightforward. 
The denominator factors are evaluated as explained on p. 23, and the 
numerator is obtained by punching 20,0, into the keyboard and multi- 
plying by N; then, with the product left in the lower dial, Ed, is subtracted 
Xd, times. If needed, the two means can be obtained by substituting Ed, 
and Xd, into (3.3), and the two standard deviations by multiplying the 
proper radical by the interval size and dividing by N [equivalent to 
substituting the sum and sum of squares into (3.5)]. 


Chapter 9 


CORRELATION: 
INTERPRETATIONS 
AND ASSUMPTIONS 


Intelligent use of the correlation coefficient and critical understanding 
of its use by others are impossible without knowledge of its properties. It 
is not sufficient that we be able merely to recognize r as a measure of 
relationship. It is a peculiar kind of measure which permits certain 
interpretations provided certain assumptions are tenable and provided we 
consider possible disturbing factors. Since the interpretations of r are so 
closely related to assumptions, no attempt will be made to present a 
Separate discussion of these two aspects. The factors which affect r, and 


which are therefore limitations additional to assumptions, will be discussed 
in Chapter 10. 


STUDY OF SCATTERGRAM 


We shall begin by makin 
perties of a typical scatter di 
have already been referred 
section of two arrays has 
marginal distributions has 
Table 9.1 is examined, it 
horizontal) array contains 
totals really represent the 
These array distributions ar 
bell-shaped with a clusterin 


Б а somewhat detailed study of certain pro- 
agram. The columns and rows of the diagram 
to as vertical and horizontal arrays, the inter- 
been called a cell, and the meaning of the 
been given. If the scatter diagram depicted in 
will be noted that each vertical (and also each 
à frequency distribution, and that the marginal 
number of cases in these array distributions. 
e very much like any other typical distribution: 
Б or scattering about a central value. The mean 
116 
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Table 9.1. Correlation table for height of fathers (X) and height of sons (Y) 


64 |65 |66 |67 |68 |69 72 


66.8 | 66.8 | 67.6 | 67.8 | 68.6 | 69.1 69.5 | 70.6 | 70.3 


.56 192 
2.49 
2.33 


к= 
М, = 67.69 
M, = 68.44 


ain become useful descriptive terms. Thus, 
in Table 9.1, the mean height of sons whose fathers were 64 inches tall is 
found to be 66.8 inches. This is simply the mean of the twelve cases which 
fall in this particular array. Similarly for all the vertical arrays we have the 
means as recorded along the bottom of Table 9.1. The means of the 
horizontal array distributions have been recorded to the right of the 
Scatter diagram. For example, the mean height of the 10 fathers whose 
sons were 72 inches tall is 70.0 inches. 

If the means of the vertical arrays are plotted (see crosses in Fig. 9.1) 
two things will be noticed: the means are progressively greater as we pass 
from short to tall fathers, and they fall approximately on a straight line. 
It will be noted (see dots in Fig. 9.1) that the means for the horizontal 
arrays also approximate a line and show progression. Now, with reference 
to the means of the vertical arrays, each represents the mean height of 
Sons of fathers of a particular height and therefore may be used as a basis 


and standard deviation ag 
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62 63 64 65 66 67 68 69 70 71 72 73 
Fig. 9.1. Plot of array means for data of Table 9.1. 


for predicting the height, if unknown, of a man if we have been told the 
height of his father. Thus, if the father is 66 inches tall, the best estimate 
of his son's height is 67.6, the observed mean height of men whose fathers 
are 66 inches in height. 

Obviously such an estimation would be subject to considerable error, 
since we have also the observable fact that the heights of sons of fathers 
66 inches tall show a large amount of variation about the array average. 
This variation tells us something about the possible magnitude of the error 
involved in using 67.6, the array mean, as our estimated value. The 
unknown height, of which we take an array mean as an estimate, may 
actually fall anywhere within rather wide limits on either side of the array 
mean. These limits can be described in terms of the standard deviation of 
the array distribution. The standard deviation for the distribution of 
heights of sons whose fathers were 66 inches in height is about 2.1. Now, 
if we take 67.6 as the best estimate, we can say that, if we were to predict 
the height of 100 sons (fathers 66 inches), about 68 per cent of the time the 
error would be within the limits 67.6 + 2.1, 95 per cent within 67.6 + 4.2, 
and nearly always within the limits 67.6 + 6.3. Likewise, when the Ss for 
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the several arrays have been computed, a statement of the limits of the 
error in predicting any son's height from his father's height can be made. 
Such a procedure will yield as many measures of error as there are vertical 
arrays. We shall soon see that a convenient assumption can be made which 
will usually allow us to use a single indication of the error of estimate. 

Let us return again to the line of the means. Two such lines have been 
drawn in Fig. 9.1; one line “fits” the means of the vertical, the other the 
means of the horizontal, arrays. Let us for the present confine our 
attention to the means of the vertical arrays. They do not lie exactly on the 
drawn line; some are above, some below. If they fell exactly on the line, a 
prediction based on an array mean would be precisely the same as a 
prediction obtained by noting the Y value of the line where it cuts the 
middle of the array. Furthermore, if the means were exactly on a straight 
line, we might write the equation for this line in the form Y= BY + A, 
where A equals the y intercept (value of Y where the line crosses the y axis) 
and B equals the slope of the line (the inclination of the line to the x axis). 
With 4 and B known, the value of Y for a particular X can be readily 
estimated. 

But, since the means do not lie exactly on a straight line, the foregoing 
reasoning would not seem off! hand to yield us anything of practical value. 
From many points of view, however, it is desirable that we determine the 
equation of the straight line which best “fits” the means, i.e., the equation 
of a line which passes near all the means. Then we can use this equation 
instead of the array means in making predictions. The justification for this 
procedure depends on the validity or tenability of an assumption: we 
assume that the failure of the means to fall exactly on a straight line is due 
to chance fluctuations in the means. Each array mean is based on a sample 
and consequently deviates more or less from the true or population value 
of the mean for the array. This is equivalent to saying that, if all the array 
means were based on a much larger number of cases, we could assume that 
xactly a straight line. This isan assumption 


ovided the array means for a particular 
Scatter do not show marked deviations from linear form. (Adequate 
checks in terms of probability, to be described later, can be utilized to 
ascertain whether the fluctuations from linearity are larger than is reason- 
able on the basis of chance.) 


they would approximate more e 
Which can always be made pr 


THE BEST-FIT LINE 


We can now consider one of the advantages of using a line instead of 
the several array means as a basis for prediction. The location of the line is 
dependent on all the means, or rather on all the cases. It therefore seems 
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reasonable to believe that the line would be more stable from the sampling 
point of view than would the array means, each of which is based on a 
rather small number of cases. 

If we accept the assumption of /inearity of array means, our problem is 
that of determining A and В so that we can write the equation of the line of 
means. We need the equations of two lines: Y = BX + A for the means 
of the vertical arrays and X = B’Y + A’ for the horizontal array means. 
We shall consider the determination of the constants A and B for the first 
equation, but before doing so something must be said concerning what is 
meant by a “best-fit” line. The constant A gives the y intercept, i.e., 
tells us where the line cuts the y axis. Suppose we think of several possible 
lines having the same slope (the same B) as the line in Fig. 9.1 which passes 
near the crosses. Obviously, if we considered'a line passing near the top or 
bottom of the scatter diagram, it would be a “worse fit" than that drawn in 
Fig. 9.1. Likewise, if we think of pivoting the line about some point, 
thereby altering its slope, it can be readily seen that rotating it to a vertical 
or horizontal position would give a worse fit. It should now be clear that 
the assigning of some values to A and B will lead to a worse fit than that 
obtained by certain other values, or conversely that some values will yield 
a much better fit than others. 

One criterion accepted as a basis for a best-fit line is that the sum of the 
squares of the deviations from the line shall be as small as possible. With 
respect to determining the best-fit line to the means of the vertical arrays, 
this criterion or definition of fit implies that the values of A and Bare to be 
such that the sum of the squared deviations of the observed heights of 
sons—deviations in an up and down or vertical direction—about the 
line will be a minimum. Stated in symbols, let Y' = BX + A, where ү 
(read Y prime) is the value estimated from a given X, and let Y be the 
observed value. Then (Y — Y^? represents the squared deviation of any 
Y from the line or estimated value. The problem is so to choose A and В 
as to make X(Y — Y^ as small as possible. It is more convenient to deal 
with both the equation, y' = bx + a, and the sum, L(y — y’)®, in deviation 
units, with y’ and y as deviations from M, and = = X — M,. This is 
merely the translation of the axes which makes the origin or reference 
point coincide with M, and M,. The student should visualize the meaning 
of this shift of axes. Note that the pattern of tallies is not changed by this 
simple transformation. Do you think that the slope B will equal the slope 
b? Will A = a? Let us keep the first question in abeyance and examine 
now the second question. Both A and a represent the y intercepts of the 
desired prediction line. If it is not immediately obvious to the student that 
A may not equal a, he should imagine that in Table 9.1 and Fig. 9.1 the 
axes have been moved so that the origin is at the center of the scatter 
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diagram, and then ask himself where the line through the means of the 
vertical arrays would cut the new y axis. (Incidentally, it should be noted 
that the value of A cannot be read directly from Fig. 9.1 for the simple 
reason that the reference frame as drawn does not include the origin. The 
real y and x axes of the original measures would be, respectively, to the left 
of, and lower than, the indicated axes.) 

It is of interest to speculate concerning the value of a in the equation 
У = br + a. Common sense would suggest that, if an individual were 
average on X, the best guess would be that he would be average on Y. 
That is, if Y = M,, we would expect Y' to equal M,. But, if an individual's 
X measure fell at M,, his deviation, or x value, would be 0, and the esti- 
mated value of Y as being equal to M, would in terms of deviation scores 
become 0. This would imply that the prediction line would pass through 
the origin of the deviation score reference axes, and consequently that the 
y intercept would be zero; hence a = 0. For the purpose of simplifying 
the determination of the best value for 5, we ask the reader to accept, on 
the basis of the foregoing reasoning, that a = 0 for the best-fitting line. If 
we carried both a and b along in the following development, а would in 
fact turn out to be zero. 

This permits us to write y = bx as the equation for estimating y, in 
deviation units, from x, or deviation values of X. Our task becomes that of 
determining the value of b which will make X(y — y? a minimum. 
Incidentally, it should be obvious that the discrepancy of any particular y 
value from the desired line has the same numerical value as the deviation 
of its corresponding original Y value from the line, and that X(y — у) 
= X(Y— YR When we have determined the optimal value for b in 
у = br, we can readily pass back to the original reference frames, the 
Bross score axes, by substituting for y' the value Y' — M, and for x, 
X — M,. With a fixed as zero, i.e., with the y intercept equal to zero, we 
can think of the line as passing through the origin (deviation axes); i.e., 
its up and down location is fixed. Obviously, many lines could be drawn 
through the origin, and they would differ only as to slope, i.e., as to b. 
Of all possible lines which may be drawn through the origin, some will be 
closer than others to the observations (tallies) in roto. The student might 
imagine several lines any of which would seem to constitute a good fit. As 
he takes lines with either greater or lesser slope than those of apparently 
ood fit, the fits will become worse; and of those which seem to fit, some 
will actually be better than others. The student might think that it would 
Only be necessary to draw what seems by inspection to be the best-fitting 
line, and then obtain its slope by actually measuring the angle which it 
Makes with the horizontal (with needed adjustment to allow for the 
Measurement units). The trouble with this procedure is that individuals 
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would tend to disagree regarding which of several lines was really best; 
also, the measurement of angles would be none too exact. What we need 
is an objective procedure, a method that will yield the value of b which 


leads to the best possible fit in the sense of reducing the sum of the squares 
of the discrepancies to a minimum. 
We set up the function 


je X(y— y' È  X(y— Ба)? 
N N 


in which we have N deviations of the form y — y' or y — bx (since 
y' = bx). These deviations when squared, summed, and divided by N give 
us a quantity or function which is to be minimized by the proper choice of 
b. The value to be assigned to b can best be ascertained by the calculus.* 
This is done by taking the derivative of the function with respect to b, 
setting this derivative equal to zero, and then solving for b. Thus 


df _ =2У2(у — bx) 


db N 
which, set equal to zero and divided by —2, gives 
Xx(y — bz) =0 
N 
or 
Lay — bXEx? E" 
N 
then 
а nus 
c AW LH 
N 


The first or cross-product term involves the correlation coefficient as 
defined by formula (8.1), from which definition formula we see that 
Zay/N = rS,S,; and since Xz?|N = S?,, we have 


rS,S, — bS? = 
or 
rS, — bS, = 0 
which gives 
be poe 
S. 


r 
* The student who has not studied the calculus will 

following derivation on faith or, if ske 

self that no magic is involved here. 


either take the first part of the 
ptical, will dig into a calculus text to satisfy him- 
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as the optimal value for b. We therefore have 
y =r S, x (9.1) 
5, | 
as the equation for the best-fitline. This equation is in terms of deviation 
measures, and by proper substitution we get 


Y-M,- rr — 96) 
S, 
or 


WM 5 
Y =r Sex (м, м) (9.2) 


z z 
as the equation in terms of the original or gross scores. This is the form 
which we would use in predicting Y from X. Note that B = b = 7(S,/S,) 
is the slope of this line and that the constant А is equal to the parentheses 
term. 

By similar reasoning the equation of the best-fit line to the means} of 
the horizontal arrays is found to be 


S. 
== т—®% 9.3 
rx (9.3) 
Which becomes 

Х'=тЗ#Ү+ (м. = re м) (9.4) 


v v 
When both equations are written in the Band A notation, we may attach 
Subscripts to differentiate between the Bs and between the As: 


Y’ = В„Х + Ay. 
Хх = Bay Y + As, 


Regression. Equations (9.1) and (9.3) in deviation score form and 
(9.2) and (9.4) in gross score form are known as regression equations, and 
the constants denoting slope are known as regression coefficients. lt is 
assumed that prediction will be as accurate by means of a regression 
equation as by way of array means, and it can readily be seen that by using 
à regression equation we can predict from intermediate values, e.g., 641. 
This is of especial advantage with grouped data: the array mean is associ- 
ated only with the midpoint value of the grouping interval, whereas the 
Tegression line is not so limited since it is continuous. 


T More strictly speaking, we are fitting a line to means weighted according to their 
Tespective Ns; i.e., we are fitting a line to the observations. 
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Rate of change. The results of the foregoing derivation make it clear 
that the correlation coefficient, along with the two means and the two 
standard deviations, enables us to write the equation by which either 
variable can be predicted from the other. The regression coefficients 
indicate the rate of change—unit of change in one variable per unit of 
change in the other—and in case the two standard deviations are equal, r 
itself indicates the rate of change. Thus we have one of the possible 
interpretations of the correlation coefficient. 

For the correlation table in Table 9.1 we get, by proper substitution, the 
following as the regression equations: 


Y' = .52X + 33.24 (to estimate son's height) 
X' = .60Y + 26.63 (to estimate father's height) 


The student should study Fig. 9.1 sufficiently to convince himself that .52 
is the slope of the line passing near the crosses, and that .60 represents the 
slope (with reference to the vertical) of the line through the dots. The 
student should also satisfy himself that the constants 33.24 and 26.63 
really represent the points at which the two lines intercept the y and x axes. 
Finally he should show that, if a father's height is at the mean of all fathers, 


the mean of the heights of all the sons is the best estimate of his son's 
height. 


ACCURACY OF PREDICTION 


The next problem to which we turn is concerned with the accuracy of 
prediction by means of a regression equation. It has already been indicated 
that, when the mean of an array is used in prediction, the error of estimate 
is a function of the spread within that array. By introducing an assumption 
it becomes possible to substitute one measure of error in place of the several, 
numerically different, array standard deviations. An examination of the 
array distributions in Table 9.1 reveals that the vertical arrays differ from 


each other very little in dispersion (likewise, the horizontal arrays). If we 
were to compute the standard deviations for the vertical arrays, we would 
find differences, for this diagram, of such size as could readily be attri- 
buted to chance or sampling fluctuations; i.e., we assume that, if we had a 
much larger N, the array dispersions would be very nearly equal. Ordi- 
narily this assumption of homoscedasticity can be met, and one measure of 
dispersion can be used for all the vertical arrays (and another for all the 
horizontal arrays). 

Error of estimate. Опе such measure might be an average of the 
array Ss, but to determine this we would need first to compute all the Ss, a 
somewhat laborious job. Since we are to use the regression line, instead of 
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array means, as a basis for prediction, we really need something corre- 
sponding to the S about this line. Such a value can be obtained by noting 
that y — y’ (or Y — Y") represents the discrepancy between estimated and 
observed values and that X(y — y’)*/N is the mean of the squared devia- 
tions, the root of which will be the standard deviation of the discrepancies 
between estimated and observed values. This will be taken as the one 
standard deviation to replace the several standard deviations as our 
measure of the error of prediction. This particular standard deviation, 
defined as the square root of X(y — y')*/N, is called the standard error of 
estimate. It may be determined in two ways. First we can take a round- 
about way which involves these steps: the prediction of each Y by use of 
equation (9.2), or each y by use of (9.1); thecalculation of the discrepancies 
(Y — Y) or (y — у); squaring, summing, dividing by N, and taking the 
square root. A quicker method for determining the standard error of 
estimate is readily derived algebraically. 

Let S,.. stand for the standard error of Y as estimated from X; then by 
definition, 
g E IT, m yy 

yr N N 


but 


by formula (9.1) whence 


MES is y— pas e) 
"= N 5, 
zd sue 2+ г + 25% и) 
= xr 21 7 y 5°, 
^2 » 2 [wv 
хи S (2) LESE) 
N S,\ N S\N 
= 5%, — 2r Эк к$,5, t PIS? 
= 58, rs. 
then = : 
S, 941—T (9.5) 
By a similar line of reasoning it can be shown that 
ER 
5, = 8.71 — (9.6) 


which gives the standard error of X as estimated from 
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Thus the correlation coefficient not only enters into the prediction 
equations (9.1) to (9.4), but also permits us to gauge the accuracy of 
prediction. It should be noted in passing that we can write the equation 
of a best-fit line without first determining r and that the error of prediction 
can also be ascertained without recourse to r. Such a method for deter- 
mining the error of estimate has already been indicated: the square root of 
У(У — ҮМ, in which Y — Y' represents the computed discrepancy 
between observed and predicted values. This need not involve r unless 
the prediction equation is written in terms of r, as was done in (9.2). The 
equation Y' = A + BX can be written in the form 

2 > > 
y = ХХ BY — BXEXY | NEXY — XXXY oy (9.7) 
NEX? — (Exy NEX? — (Exy 

in which X and Y stand for gross or original measures. Formula (9.7) for 
the best-fitting line (least squares solution) does not involve means, Ss, or 
the correlation coefficient. If, as is frequently the case, we are interested in 
obtaining the equation for Y only, it will be noticed that it is unnecessary 
to compute the sum of the Y squares, which is not, however, a tremendous 
saving of time. Perhaps the quickest way for determining the equation is 
by direct substitution in (9.7), but the determination of the error of 
estimate (sometimes called the closeness of fit of the line) is certainly 
facilitated by calculating r and S, and substituting in (9.5). 

The standard error of estimate is to be interpreted as a standard devia- 
tion, and in so doing we are tacitly assuming that the array distributions 
are not only equal in dispersion but also normal. For the correlation 
diagram in Table 9.1, we have S, = 1.9, which is to be considered the 
standard deviation of the Y values about the regression line, Y’ = .52X 
+ 33.24. By use of this equation we would predict that the height of the 
son of a man 70 inches tall (X = 70) would be 69.6, and the error of esti- 
mate, 1.9, would be interpreted by saying that, if we made many such 
predictions, 68.26 times out of a hundred the actual height of sons of 
70-inch fathers would be within the limits 69.6 + 1.9, and nearly always 
within the limits 69.6 + 3(1.9). 

This is a second method for interpreting the correlation coefficient: in 
terms of the accuracy of prediction or closeness of fit of regression lines. 
If no correlation exists, the errors of estimate are $,, S, and Spy = S,. 
In this connection it can be seen from formulas (9.2) and (9.4) that, when 
r = 0, the estimated У, Y’, becomes M, and X' becomes M,. For example, 
if it has been established that the correlation between toe length and IQ is 
zero, we would always take 100 (the mean) as our best guess for an 
individual's IQ regardless of toe length. The error of estimate would of 
course be the standard deviation of the distribution of IQs, and it would be 
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said that toe length is useless in predicting IQ. The scatter diagram for 
IQ as Y and toe length as X would exhibit the following characteristics: 
first, the regression line Y' = A + BX would be horizontal, i.e., B would 
equal zero, and the means of the arrays would fluctuate about the value 
M,, or A would equal M,; and, second, all the array distributions would 
have dispersions approximately equal to S,. What would be the best guess 
as to the other regression line and the standard deviations of the horizontal 
arrays? 

Now suppose the correlation between the variables were perfect (r = +1 
or — 1). The tallies in the scatter diagram would lie in a line, there would 
be no spreading about this line, the two regression lines would coincide, 
and no error would be involved in estimating X from Y or Y from X. 
That S,., and S,., would both be zero in case of perfect correlation is quite 
evident when we consider formulas (9.5) and (9.6). 

At this point the student should note the difference between positive 
and negative correlation. In the case of a positive r, a high score goes with 
high and low with low, whereas, for a negative r, high goes with lowand low 
with high. With reference to the scatter diagram, a negative r typically 
involves a swarm of tallies stretching from the upper-left to the lower-right 
Corner, whereas for a positive r the trend is from lower left to upper right 
(this assumes that the axes have been laid off in the conventional fashion). 
With reference to the regression equations, a negative r yields negative 
regression coefficients or negative slope for the lines. The student should 
be warned that an apparently negative r may in reality be positive. Thus, 
if one variable is a test or performance scored in terms of time (or errors) 
and the other variable is scored in terms of amount done, the scatter 
diagram might show large time scores as going with small amounts of 
Work done, i.e., high with low, which might be wrongly taken to indicate 
negative rather than positive correlation. Instead of asking whether high 
Boes with high and low with low, it is safer to ask whether best goes with 
best. This rule, however, is difficult to apply when we are dealing with the 
interrelation of personality traits, especially those which do not readily 
Permit of a statement as to which is the desirable end of the trait scale. The 
Sign of the correlation coefficient in such cases always needs a qualifying 
Statement which explicitly tells the direction of the relationship between the 
variables. Obviously, as far as accuracy of prediction is concerned, the 
€rror is the same for a negative and positive r of the same magnitude. 

Alienation. To return to the interpretation of the correlation co- 
efficient by way of the standard error of estimate, we see that the factor in 


formulas (9.5) and (9.6) which involves r is V1 — r°. Itis the value of this 
which, when multiplied by the proper S, leads to the error of estimate. The 
expression VI — г? is called the coefficient of alienation. If r is zero, its 
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value is 1 and the error of estimate is the S for the variable being estimated. 
Table 9.2 gives the value of ihe coefficient of alienation for varying values 
ofr. The student will do well to fix in mind the trend in this table. It will 
be noted that, compared to a correlation of zero, an r of .60 reduces the 
error of estimate by 20 per cent, whereas an r of .30 reduces it by about 5 
per cent; that r must be as high as .866 before the error of estimate is 
reduced by one-half; and that the difference in reduction between an r of 
.70 and an r of .90 is approximately the same as that between .20 and .70. 


Table 9.2. Values of the coefficient of alienation 


r М1 – № r УІ = № 
.00 1.000 .60 .800 
10 995 -70 714 
20 .980 .80 .600 
.30 .954 .866 .500 
.40 3917 .90 .436 
.50 .866 95 312 


This interpretation of r is most useful and at the same time most disturbing, 
since the errors of estimate for rs in the vicinity of .40 to .70, values usually 
found and utilized in predicting success from test results, are discouragingly 
large. 

A somewhat different way of grasping the meaning of r, as it is applied to 
accuracy of prediction, is to square both sides of formula (9.5) and then 
solve explicitly for r. This leads to 


2 5? 


=1— 


r (9.8) 


from which it is readily seen that the correlation coefficient depends on the 
accuracy of prediction relative to the total variance of the variable being 
predicted. 

It might be well at this time to bring together a few remarks concerning 
the assumptions involved in using and interpreting a correlation coefficient 
in terms of either rate of change or accuracy of prediction. When an r is 
reported, and no evidence to the contrary is given, we have a right to 
expect that the assumptions of linearity of regression and homoscedasticity 
have been met. The interpretation of r as rate of change definitely assumes 
linearity, and the interpretation in terms of the error of estimate definitely 
assumes both linearity and homoscedasticity. In certain special cases 
where the investigator is interested only in a one-way prediction, say Y 
from X, and there is no likelihood of ever reversing to predict Х from У, it 
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will suffice if the regression of Y on X, i.e., for predicting Y from X, be 
linear and the Y or vertical array distributions be homoscedastic. The use 
of the correlation coefficient in predicting performance from age may be 
cited as an instance in which there is no need to worry about the possible 
nonlinear regression of age on score or the lack of homoscedasticity about 
this regression line. 

Although there are adequate checks for linearity and homoscedasticity, 
a careful scrutinization of the scatter diagram is usually sufficient to warn 
us of violent departures from these assumptions. Formula (8.2) and other 
nonplotting schemes for computing r give no inkling as to whether these 
assumptions are being violated and therefore cannot command the confi- 
dence of the careful investigator. The purpose of a research project might 
very well be the study of the relationship between two variables, but an 
end result in terms of a correlation coefficient, with no attention given to 
the form of the relationship, is inadequate. 


VARIANCE AND CORRELATION 


A third method of interpreting r is in terms of variance. Before discussing 
this interpretation, we must introduce an important theorem concerning 
the variance of a sum (or difference). Suppose that variable W is made up 
of two parts U and V such that W = U + V. For example, the score on 


an arithmetic test might consist of two parts: score in addition and score 


in multiplication. Obviously, w = u + v, and therefore the variance of 


the W variable is 


(Хи? + Xv? + 2Хир) 
= 5%, + 5%, + 2r, S.S. (9.9) 
апа in сазе U and V are independent, we have 
5%„ = 5°, + 5°, (9.10) 
If we are dealing with the difference, W — U — V, we have 


Sey = St, + 55%, — 275,5, (9.11) 


and for U and V independent, we have 
= St, + 5°, 
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which is identical with (9.10). In words, the variance of a sum (or difference) 
of two independent variables is equal to the sum of their separate variances. 
Variances are additive, whereas standard deviations are not. lt can be 
shown that, when U and V are distributed normally, their sum or difference 
will also yield a normal distribution. 

Now, with regard to the third method for interpreting r, let us note that 
in deviation units an observed y can be thought of as made up of two 
independent parts, the part which can be predicted from x, namely y, 
and the residual or unpredictable part, (y — у). Before going further we 
must demonstrate that y' and (y — у) are really independent. The 
numerator for the correlation between y’ and (y — у”) can be expressed as 


А т $, j 
Xy'(y— y). But, since у =r sz and (y — у) = у – "2, ме һауе 


z z 


S, S; 
Ey'(y — у) = Er = ( —r- ) 
yy — у) 5 200—762 


z z 


= „Зу 25% уз 
Bid е — da 


r ЕЯ 
8 2s 
= NERA, —F XE 


which is seen to be zero; hence y' and (y — y’) are uncorrelated. 
We have у = y' + (y — y); whence, by the foregoing variance theorem, 


i93, = SE op SE. (9.12) 


in which S?,., is the variance of the residuals, (y — y’). If we divide both 
sides of this equation by S?,, we get 


(9.13) 


from which we see that, since the two ratios add to unity, either one can be 
interpreted as a proportion (or a percentage by shifting the decimal 
point). Thus the ratio of S?, to S?, is the proportion of the variance in Y 
which can be predicted from X, and the ratio of S?,, to S?, represents the 
proportion of the variation (variance) of Y which is s left over or remains or 
cannot be predicted from X. A little reflection as to the meaning of this 
residual variance should convince the student that we are here dealing 
with the same variance which results if we square formula (9.5), thus 
85, = S*(1 — 59) 
which means that 
S? 


же qr 


SS 
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When we substitute this value into (9.13), we have 

2 
EE" 


2 


y 


1 +1-r 


from which it is readily seen that the ratio 


Se a 
EE 9.14 
a (9.14) 


That is, the square of the correlation coefficient gives the proportion of the 
total variance of Y which is predictable from Х, or г? measures the 
proportion of the Y variance which can be attributed to variation in X. 
The proportion of the variance of Y which is due to variables other than 
X is given by 1 — r?. By shifting decimals, we сап think of г? as indicating 
a percentage, the percentage of variance which has been explained, and 
1 — r? as the percentage of variance due to other causes. It will be noted 
that r2, not r, can be so interpreted. This is true because variances are 
additive, whereas standard deviations are not. It should be emphasized 
that r? as a proportion has to do with variation expressed technically as 
variance. 

It is of some interest to examine the meaning of S?,.. It is the square of 
the standard deviation of the estimated values, and, with reference to the 
scatter diagram, S, corresponds approximately to what we would obtain 
if we were to compute the standard deviation about M, of the vertical array 
means, each weighted according to the number of cases іп its array. As an 
exercise, the student can prove r? = S?,[S?, by determining directly, rather 
than by formula (9.5), that 5*, = r°S*,. (HINT: use the deviation score 
form of the regression equation.) 

This third method of interpreting a correlation coefficient assumes 
linearity of the regression line involved in predicting Y, or the dependent 
variable, from X as the independent variable; i.e., the regression of Y on 
X must be linear. If X were considered as the dependent variable, the 
r2 indicates the proportion of the variance of X ex- 
plained by Y would assume linearity for the regression of X on Y. The 
assumption of linearity becomes explicit if it is proved directly that S?,, 
= r282, and it was implied when we used $?,, in that this residual 


v м : 
Variance was taken about a straight line. This interpretation does not 


assume homoscedasticity, nor does it assume normality either for the 
marginal or for the array distributions. 

The investigator who is interested in analyzing variation and its possible 
causes will prefer the interpretation of the correlation coefficient in terms 
of variance. The problem is frequently one in which an attempt is made to 


interpretation that 
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explain variation in one trait in terms of variation of another which is 
conceived of as being more basic. The use of г? as the percentage of the 
variance of a trait which is predictable by, or attributable to, variation in a 
second variable becomes a valuable tool in the analysis of variation. Of 
course we must use caution in assuming causation of one variable by 
another. Logic, not statistical method, must be invoked to determine 
whether a causal relationship exists, and the statistical interpretation 
modified accordingly. Variation in X might cause variation in У, or vice 
versa, or variation in both X and Y might be due to the influence of some 
other variable or variables. 

To illustrate the interpretation of г? as a percentage, let us suppose we 
have the performance of a group of school children on a substitution test. 
Considerable variation in scores will be present, and we may rightfully 
ask whether a portion of this variation is due to age differences. We can 
determine the correlation between age and performance. Suppose r — .60; 
this can be interpreted by saying that 36 per cent of the variance in 
performance is due to age differences, and 64 per cent is due to other 
causes. Likewise, the variance in crop yield due to variation in rainfall 
can be determined; or the variance in the height of a group of men may 
be analyzed into two or more parts, one of which might be the portion due 
to variation in the heights of their fathers. 


CORRELATION AND COMMON ELEMENTS 


A fourth possible interpretation of the correlation coefficient assumes 
that each of the two variables can be thought of as a summation of a 
number of equally potent, equally likely, independent elements, which can 
be either present or absent. Then the degree of correlation is a function of 
the number of elements common to the two variables. The general 
formula is 


n 


é (9.15) 


ту = We Мыз. керлик 
Мп. nen, + n, 


in which n, equals the number of elements unique to X, n, the number 
unique to Y, and n, the number common to both variables. If the number 
of elements in X equals the number in Y, r gives the proportion of elements 
common to X and Y; if X is determined only by elements common to Y, 
whereas Y has additional elements, ғ2 gives the proportion of elements 
entering into Y which determine X. There is little, if any, factual basis 
for believing that the assumptions stated are tenable so far as psychological 
variables are concerned, and therefore the interpretation of the correlation 
coefficient in terms of common elements may be viewed with scepticism. 
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NORMAL CORRELATION 


A fifth interpretation of r is more mathematical but of little practical 
value. We have already seen how a frequency distribution and its polygon 
can be thought of as smooth, conforming perhaps to the equation of the 
normal curve. A correlation table is a frequency distribution, a picture or 
graph of which requires a third dimension. If we were to replace each tally 
in a scatter diagram by a thin block, there would result something ana- 
logous to the histogram except that it would be three dimensional—the 
heights of the stacks of blocks would indicate the frequencies for the 
various cells. Now suppose that this mound of blocks is by some method 
smoothed to a surface, and we consider the total volume under the surface 
(between the surface and the XY plane) as representing N. Then the 
number of cases falling between two given X values and simultaneously 
between two given Y values will be approximately the volume of that 
portion of the mound which has as its base the rectangle or square formed 
by the intersections of the two X and two Y values. If the regression lines 
are linear, if the array distributions are normal and homoscedastic, and if 
the marginal distributions are normal, the resulting surface is termed the 
normal correlation surface, and the equation of the surface can be written 
as 


2 2 


c у 2тху 


ЕЕС dh. a „йш 
ee KE mn E a (9.16) 


2 


2mo,0,N 1 — 
A number of important properties of the normal correlation surface can 
be deduced from this equation and its integral. For instance, the standard 
error of estimate can be derived from formula (9.16), and it can also be 
shown that the contour lines which represent different altitudes on the 
mound, i.e., different frequencies, will be concentric ellipses, and that if 
r = 0, the contour lines will become concentric circles. If the equation is 
written with N equal to unity, by double integration the probability of an 
individual's falling between two particular Y values and between two Y 
values can be determined. Tables are available which can be utilized for 


this purpose.t 


LIMITS FOR r 


Attention is called to the fact that definition formula (8.1) becomes 
rz гг] М, when written in terms of standard scores for both variables. 
This indicates specifically that the correlation coefficient is a statistical 


H Pearson, Karl, Tables for statisticians and biometricians, part Il, Cambridge: Cam- 
bridge University Press, 1931. See Tables 8 and 9. 
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average, the average of the cross products of standard scores. Suppose that 
we ask what happens when the correlatior is perfect in the sense that each 
individual's z, score equals his z, score. If this is true, the sum Уел, would 
be the same as У2?, which when divided by N gives 1.00. Thus the upper 
limit for r is +1.00. Now suppose a perfect inverse relationship, such that 
an individual's z, and z, are the same except for sign, one being positive 
whereas the other is negative. If this holds true for all the cases, the sum 
Ez,,can be written as Xz(—z) or —Ez?, which when divided by N gives 
— 1.00 as the limit for perfect negative correlation. 

As exercises, the student should show that multiplying or dividing 
either Y or Y or both by a constant, or X by one constantand Y by another, 


will not change r, and that adding or subtracting a constant does not affect 
the value of r. 


SUMMARY 


The five suggested methods for interpreting the correlation coefficient 
may be briefly summarized here. 

l. r is associated with the rate at which one variable changes with 
another. This assumes that the regression line so interpreted is linear. 

2. r tells us how accurately we can predict by a regression equation. 
The standard error of estimate permits one to infer the possible magnitude 
of the prediction error, whereas the coefficient of alienation indicates the 
reduction in error over that error which would exist if there were no 
correlation. This interpretation assumes that the regression line used in 
predicting is linear and that variation about this line is normal and homos- 
cedastic. 

3. r?° gives the proportion of variance in Y predictable from, or attribut- 
able to, variation in X. This assumes linearity for the regression of Y on X 
and requires caution in assuming the direction of cause and effect. 

The student should attempt to visualize the meaning of these three 
principal methods of interpreting correlation. In particular, he should 
note the meaning of S,, 5,., and S,., (or their counterparts with the sub- 
scripts y and x interchanged). The first, S,, holds for the marginal distri- 
bution of all Ys; S, pertains to the variability of all Y values as predicted 
from X; the third, S,., is a measure of the variation about the regression 
line for predicting Y from X. 

For none of these three interpretations of r do we have to assume normal 
distributions on the margins. However, it is possible and likely that 
nonlinearity, lack of homoscedasticity, and nonnormality of arrays will 
tend to be associated with skewness in one or both the marginal distri- 
butions. 
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4. r or r? can be interpreted in terms of the proportion of elements 
common to the two variables provided we are willing to make rather 
hazardous and unrealistic assumptions as regards the nature of the 
variables. 

5. r can be interpreted mathematically in terms of the equation for the 
normal correlation surface. This assumes that both regressions are linear, 
that homoscedasticity and normality hold for both the horizontal and 
vertical array distributions, and that both marginal distributions are 
normal in form. 

The nature of the investigation will usually dictate or suggest the 
appropriate interpretation. Ordinarily the fifth will not be used in connec- 
tion with the application of the correlational method, whereas the fourth 
rests on assumptions which can seldom be met. 


Chapter 10 


FACTORS WHICH AFFECT THE 
CORRELATION COEFFICIENT 


Before we interpret or draw conclusions from a particular correlation 
coefficient, it is necessary that we ask ourselves what factors might have 
affected its magnitude? The size of an obtained r depends upon several 
specific conditions, and, even though it is not always essential that correc- 
tions be applied, the investigator must forever be on the lookout for 
correlations which deviate from their “true” value because of the operation 
of disturbers. This chapter is devoted to a discussion of the more common 
factors which influence r. 

It is assumed that errors in computation have not been permitted—that 
all arithmetical work has been checked. It is also assumed that sufficient 
intervals have been used so as to make unnecessary the application of 
Sheppard's correction for grouping; if more than twelve intervals have 
been used, the slight increase in r which results from correcting the stand- 
ard deviations will be negligible. Certain textbooks have advocated a 
correction to r for smallness of the sample, which correction reduces r by 
a negligible amount. In view of the magnitude of the effects of other 
factors on r, these two possible corrections seem trifling. 


SELECTION 


One of the first questions which must be faced is: Do the cases on 
which r is based represent a random sampling of some defined population, 
or have selective factors so operated as to increase or decrease r? The 
literature of psychology is not free from correlation coefficients which are 
decidedly different from values that would have been obtained had the 
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sampling been random. This is not to say that any investigator has willfully 
selected his cases so as to produce correlation, but rather to say that 
unwitting errors are frequently present in spite of an effort to avoid 
selective factors. 


SAMPLING ERRORS 


Even though we may feel reasonably sure of the randomness of the 
sample on which an r is based, it is still necessary to consider the obtained r 
in terms of variable errors due to sampling. Any r based on N pairs of 
observations will differ more or less from the universe, or population, 
value which is here conceived of as the value of the correlation coefficient 
which we would obtain if we had an infinitely large sample. Many of the 
older texts gave (1 — r?)/V N as the standard error of r, but failed to point 
out a serious limitation as regards interpretation: that this is an approxi- 
mation and that rs for successive samples are not distributed normally 
unless N is large and/or the universe value is near zero. 

Before further discussion it should be said that some measure of the 
sampling fluctuation of the correlation coefficient is highly desirable for 
any of three reasons. (1) We may wish to say whether an obtained r can 
be taken as representing a real, nonchance, correlation, іе, whether it 
deviates sufficiently far from zero so that we cannot regard it as a chance 
fluctuation from no relationship; (2) we may wonder whether a given r 
deviates significantly from some a priori or expected value; or (3) we may 
raise the question of whether two obtained rs are significantly different from 
each other. The answers to these questions must be in terms of probability, 
and the probability figure which we accept as indicating significance 
determines the confidence with which we regard any such conclusions as 
we set forth. е " " 

If N is greater than 50, and if we are interested in saying whether or not 
an r (of .50 or less, usually) is significantly different from zero, we can 


determine its standard error by 
с, = ES (10.1) 


and then divide the obtained r by this standard error in order to secure a z 
value with which to enter the normal probability table. If r/o, is greater 
than 2.58, we can conclude with a fairly high degree of sureness that the 
true or universe value of r is likely to be greater than zero. 

For N less than 50, it is necessary to follow a different procedure. It can 
be shown that, if the correlation coefficient is computed for successive 
samples drawn from a population for which the correlation is zero, the 
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successive values of 
Tm LI ._. 2 2 (10.2) 
м1 = X(1— т) —2) 
will follow the z distribution with df = N — 2. If a sample / reaches the 
.01 level of significance, one would conclude that it is nota chance deviation 
from zero, or that some correlation exists between the two variables 
involved. 

From the foregoing expression, it would appear that the / for testing 
the significance of correlation is nothing more than an r/s,, with s, 
= V( — 7)/(N — 2)asan estimate of the sampling error of r. However, 
there are subtle mathematical reasons why such an interpretation is not 
permissible. 

The student may wonder why the dfistakenas N — 2. Actually, when we 
test the significance of an r, we are testing the significance of regression. If 
r is zero, the regression is zero in the sense that the regression coefficient or 
slope of the regression line is zero. Now a linear regression line involves 
two constants, its slope and its intercept; hence 2 degrees of freedom are 
lost in fitting the line. Suppose N — 2, and that the two X scores differ; 
likewise, the two Y scores. lmagine these pairs of scores plotted in a 
scatter diagram, and a regression line fitted or a correlation coefficient 
computed. The regression line would go through both plotted points; 
therefore for the sample of two cases the prediction would be perfect and r 
would be unity. The student may, as an exercise, prove algebraically that 
when N = 2 and when there is variation in both X and У, the correlation, 
must be +1 or —1. In other words, with № = 2 there is no freedom for 
sampling variation in the numerical value of r. 

The ¢ test of r assumes normality and homoscedasticity either for the 
vertical array or for the horizontal array distributions. Nothing is assumed 
about the total X and Y distributions. There is evidence, as with the ¢ test 
for means, that sizable violations of the assumptions are tolerable, but 
there is always comfort in knowing that the assumptions are fairly well 
met. 

It might be remarked at this place that if the sum of squares of the 
deviations about a regression line were divided by N — 2, the df, we would 
have an unbiased estimate of the error of estimate variance. This added 
precision seems unnecessary in the practical situation where prediction 
(regression) equations are usually based on sizable Ns. 

Formulas for the standard error of r, when F pon 1$ large, are misleading, 
because for high values of r,,,, the distribution of successive sample values 
is markedly skewed. This skewness becomes noticeable when r „op reaches 
-40 or .50 and increases rapidly as r,,, nears unity. The skewness is also 
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a function of N. Because of this skewness the standard error of r loses its 
meaning; it cannot be expected to yield a trustworthy answer as to whether 
an obtained r deviates significantly from some a priori value, nor can the 
significance of the difference between two rs be determined by substituting 
in the ordinary formula for the standard error of a difference. 

The r to z transformation. Professor R. A. Fisher has developed a 
very useful and accurate technique for handling sampling errors for high 
values of r. This procedure is also applicable for low rs and can be used 
when N is large or small. He employs a transformation 


z = } log, (1 + r) — } log. (1 — r) (10.3) 
or 
= 1.1513 logo =E (10.4) 


which has two distinct advantages: (1) the distribution of z for successive 
samples is independent of the universe value, i.e., for a given N the sampling 
distribution will have the same dispersion for all values of r,,,; (2) the 
distribution of z for successive samples is so nearly normal that it can be 
treated as such with very little loss of accuracy. The standard error of z is 


: (10.5) 


Since the standard error of z is a theoretical value, 


Note on notation: Г theor 
, and hence does not involve estimation from the 


dependent solely on N. 
sample, it is symbolized as о, rather than as 5, Or 5,. 
If we wish to state the .99 confidence limits for г, ме transform the 


obtained r to z by formula (10.4) or by Table B of the Appendix, determine 
с, find z + 2.580, and z — 2.580., and then transform these two z values 
back to rs by using Table С. Asan example and in contrast to the less exact 
procedure of taking r + 2.585,, where 5, = (1 — 12)|V/N, let us suppose 
an r of .90 based on an N of 50. The standard error of r by the usual 
formula is .027; whence .90 + (2.58)(.027) yields the values .830 and .970 
as confidence limits for the universe value. Now, if we utilize the z 
transformation, we find z = 1.47, and c, = .146, whence 1.47 + (2.58) 
(.146) gives 1.093 and 1.847. These two values are then transformed back 
to the two r values, .798 and .951, which it will be noted differ from the 
confidence limits for г 48 determined by using the classical Sis 

Note: Since the foregoing 2 is not a relative deviate, it should not be 
referred to as a standard score. А A 

Difference between rs. If we wish to determine the significance of 
the difference between two rs, both are transformed into zs, and the 
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standard error of the difference between the two zs is obtained by 


au mE EN 
а N—3 Jy — 3 


(10.6) 


and then the ratio of the difference to its standard error is treated in the 
usual manner. If the zs are significantly different, we conclude that the 
two rs are significantly different. 

Suppose we have the correlation between X, and X, and also between 
X; and Хз, with both rs based on the same sample of N cases, and we wish 
to decide whether there is a significant difference between гу, and г. The 
foregoing method is not applicable because we need to allow for the fact 
that, for successive samplings, гү, and r} are not independently distributed, 
but correlated. The standard error of the difference must include a 
subtractive r term involving the correlation between the correlation co- 
efficients. The methods for estimating this needed correlation are none too 
satisfactory, but there is a test which is interpretable by way of the г table 
for N small and by way of the normal table for N large. It has been shown 
that 


(ris — ria) (N. — 3)(1 + ros) 


RET — ry — з — ry + Wiel ila) 


(10.7) 


follows the г distribution with № — 3 degrees of freedom when the null 
hypothesis of no difference is true. If t is significant, we conclude that one 
variable correlates higher than the other with 14. 

Averaging correlations. When we have two (or more) sample values 
for the correlation between two variables we may wish to average the rs 
(1) in case it is known that the samples have been drawn from the same 
population or (2) in case it can be assumed (because the rs are not signifi- 
cantly different from each other) that the samples have been drawn from 
equally correlated populations. An appropriate procedure is to convert 
each r to z, then take a weighted (each z by the inverse of its sampling 
variance) average of the zs. Thus, for three sample values this weighted 
average is given by 
(М, — 3ya + (№ — 3)ь + (N; — 3) 

(Ni — 3) + (Na — 3) + (Ng — 3) 


This z,, can be transformed back to an r,andany significance test concerning 
such an average r woula be made on z,, which has a standard error of 


av 


IN (N, — 3) + (Ns — 3) + (Ng — 3) 


Sampling errors of regression coefficients. Those who are ascertaining 
relationships involving two variables, one of which can definitely be 
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characterized as independent (X) and the other as dependent ( Y), or one 
as an antecedent and the other as a consequent variable, may prefer to 
specify the relationship in terms of the regression constants, B and A. 
Both of these are, of course, estimates of unknown population values and 
are therefore subject to sampling errors. Ordinarily A is of little interest 
whereas B specifies the rate of change and may be used for testing hypo- 
theses. Does the sample slope vary significantly from some hypothetical 
value, B,? Or does the slope differ significantly from zero? Occasionally, 
we may have two regression coefficients (same variables) and wish to test 
the significance of the difference between the two slopes, В, and By. 

To test whether or not a single slope, B, differs significantly from zero or 
some other a priori value, we need an estimate of its standard error. The 


classical standard error of B,, was taken as 


Note that the formula involves a ratio which is a function of the Y measure- 
ment units relative to the Y measurement units. This is reasonable since 
the slope is also a ratio of Y to Y units. (The standard error of any 
statistic must be in the same units as the statistic.) The upstairs part 
of the formula is the familiar standard error of estimate. It is reasonable 
that the sampling stability of В. should depend on the variability within 
the arrays because, in effect, we are fitting the regression line to the 


array means (weighted by their Ns), and the stability of these means is a 


function of the array variances. 

Greater precision in testing hypotheses regarding В will be achieved by 
having an unbiased estimate of its sampling error. | It will facilitate 
exposition to note that for any variable, V, the variance is given by 
S?, = Хи] №. If we have S?, and wish to recover the sum of the squares of 
the deviations, Xv”, we can use Sv? = NS?. To secure an unbiased 
estimate we would have 5°, = Уш (М — 1) = NS*,(N — 1). Paren- 
thetically, it might be noted that since S,/S, = s,[s,, the slope would be 
unchanged by introducing unbiased estimates of the two standard 
deviations. 

For the sampling variance of By, we need 


spu = 222 (10.8) 


t specified we have allowed a mixture of biased 


in which for reasons not ye mi 
, we would need to divide X(Y — У”)? 


and unbiased estimates. To get 5^j. 
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by its df, N — 2. But S*,, = X(Y — УЗА, hence X(Y — Y^? = NS?,, 
= NS?,(1 — r°). Therefore 


XY —YYy NS2(1 — p) 


to 


5 vr 
N—2 N—2 
which leads to 
NS?(1 — т) S*(1 — r2) 
2 PT N—2 N—2 
S By: = 2 = 2 = 2 
NS*, NS*, Ps 


the square root of which gives 


= S1 =— 


~ S/N —2 


(10.9) 


Spy. 


To test the null hypothesis that the slope for the population is zero, we 
have 


Bo r(S,/S,) 
Sp. S V1 — PISNN —2 
with N — 2 degrees of freedom. Note that since S,/S, cancel, it follows 


that if we used unbiased estimates, s, and s,, they would also cancel. With 
the Ss cancelled we have the ratio 


t= 


5 
Ма — ANN — 2) 


which is the ¢ test for the significance of r. When testing the null hypothesis 
of zero slope we are doing nothing more than testing the null hypothesis of 
zero correlation, a fact that certainly could have been anticipated since 
the only way that B,, = /(5,/5,) can be zero is for r to be zero. The 
mathematical purist can say that В could be zero when S, is zero, but when 
S, is zero the slope and r become indeterminates: you cannot have a 
relationship or a slope in the absence of variation for either variable. 

For those who have an aversion to r and standard deviations and who 
persist in the erroneous belief that a test of B,, differs from a test of r, it 
should be noted that r and the Ss can be avoided by expressing В„„ as in 
equation (9.7) and taking the following as an expression for the unbiased 
estimate of the sampling error of B,,: 


t= 


x(Y — Yy 


N—2 
5р, = ———————— 
МУХ? — (ZX)? 


N 
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in which 
X(Y— Y? 2ZY*— AEXY— B,EXXY (10.10) 


with 4, and B,, as in equation (9.7). 

Difference between regression coefficients. Let B, and B, be the values 
for В for two independent samples of N, and № persons. The г test for 
the difference between the two Bs is analogous to that for testing the 
difference between independent means. We need sp, which, it might be 


guessed, will follow the usual pattern: 
Sp, = vV $T Sh 


for which we need the best unbiased estimates of the two variances under 
the radical. Instead of using the two residual variances separately, a 


combined estimate is obtained by combining the sums of squares of 
deviations about the respective regression lines and then dividing by the 
combined df, or М, + № — 4. The best estimate of the (assumed to be) 


common residual variance may be expressed in symbols as 
„ _ EXY- YS + ү) 
Sys = N, + = 4 

with each numerator term calculable by equation (10.10) written separately 
for the two groups. If the rs and Ss have been computed, we may use the 
exact equivalent , : 

N,S?, (1 — rh) + NS" — r3) 
cS ez — N, + N.—4 


Then by utilizing (10.8), we have 
a 


2 s 
Pes Sve у zm 
= 2 
Dp NiS a NSS s, 
and г = (B, — В„)/5$р„ a8 а ratio that follows the t distribution with 
= ин. b 
N, + N, — 4 degrees of freedom. 
Itis Шы note that whereas the г tests for the Ыр оза еа, 
Р, Zero and B... zero, are identical, the test for the difference between Bs 
pop pop З rence between the two respective rs. 


is n t for the diffe E m 
ot the same as tha d by considering two rs and two 


That this should be so may be understoo 
Bs based on two samples one of which has a larger range of Xs (larger S,) 


than the other. The two slopes could very well Бе ceni diete e iiem 
r be considerably higher than the other. This eei es = rary. or 
"regression" instead of “correlational ees = у ч es ti = 
independent-dependent variable situation, the slope 15 n ction of 


curtailment on X, or the range of Ropon 
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RANGE OR SPREAD OF TALENT 


The magnitude of the correlation coefficient varies with the degree of 
heterogeneity (with respect to the traits being correlated) of the sample. 
If we are drawing a sample from a group which is restricted in range with 
regard to either or both variables, the correlation will be relatively low. 
Thus the restricted range of intelligence is one factor which leads to lower 
correlation between intelligence and grades for college students than that 


Table 10.1. Values for r,, for R,,s of .30,.40, - - - .80 with sd,/SD,, 
values of .90, .80, · - · .50 


К, 
sd,/SD, .30 40 :50 -60 -70 80 
-90 272 -366 -461 .559 -662 -768 
.80 244 .330 419 514 617 730 
-70 -215 .292 375 465 -566 .682 
.60 -185 .253 327 410 :507 -625 
:50 135 213 277 351 440 .555 


usually found for high school groups. If the range with respect to one 
variable has been curtailed, and we know the standard deviation for an 
uncurtailed distribution, it is possible to adjust the correlation for the 
difference in range, provided we can be sure of the tenability of two 
assumptions: that the regressions are linear and that the arrays are 
homoscedastic for the scatter based on the uncurtailed distribution. If the 
curtailment is in variable X, and we let 


sd, = S for curtailed distribution 
SD, = S for uncurtailed distribution 
r,, = correlation for curtailed range 
R,, = correlation for uncurtailed range 


the relationship by which we would predict A,, from sd,, SD,, and Pals 
given by 
r,(5D,/sd 
Ry = — us zl 2 (10.11) 
МІ m rh rS D. Jsd )* 

Obviously, if we have R instead of r, the value of r for a restricted range can 
be estimated by formula (10.11). All we need to do is interchange SD, 
and sd,, R and r, and then substitute to find ғ. The estimation of r need not 
be made in ignorence of whether the assumptions of linearity and homo- 
scedasticity can be met; an examination of the accessible scatter for the 
uncurtailed range will reveal the facts. 
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Formula (10.11) indicates definitely that the magnitude of the correlation 
coefficient is a function of the degree of heterogeneity with respect to one 
of the traits being correlated. A better appreciation of the extent of this 
influence can be had by examining Table 10.1 which gives, for varying 
values of R,, along the top and different sd,/S D, ratios along the left, the 
corresponding values of r,,. lt can be shown that double selection, i.e., 
curtailment on both variables, tends to depress the correlation coefficient. 
Since the formulas for “correcting” for double curtailment are not too 
satisfactory, none is given here. 

One important rule emerges from the foregoing: standard deviations 
should always be reported along with correlation coefficients, and some 
indication should be given as to variation typically found for the variables. 


EFFECTS OF UNRELIABILITY 


Before considering the effect of unreliability, or errors of measurement, 
on the correlation between two variables, it is necessary that we digress 
to explain briefly what is meant by reliability. If we were assigned the 
task of determining the height of an individual by the use of a tape 
measure, we might be satisfied with one measurement, but unfortunately a 
single determination might not be entirely free from error. To overcome 
this, two or more measures are averaged on the assumption that the chance 
or variable errors will more or less cancel out. If we compute the standard 
deviation of the distribution of several measurements (of the same thing), a 
summary figure indicating the possible magnitude of the variable errors 
will be obtained. This S neither pertains to nor measures the magnitude of 
a possible constant error, i.e., an error which affects all the measurements 
in the same direction. We are here concerned only with the magnitude of 


variable errors, or inaccuracies in measurement which are of a chance 


nature. 
Reliability. If we had the problem of determining the error in the 


measurement of height, we could make several measurements on one 
person and compute a measure of accuracy, or we might make just two 
measures on each of several persons and take some function of the differ- 
ence between the two measurements for all M individuals as our gauge of 
accuracy. Either scheme leads to an estimate of the size of the variable 
errors that may be involved. Р 

In psychological measurement, it is not always feasible or possible to 
Obtain more than two measures on an individual for a given trait; hence 
it is necessary to use the second-mentioned scheme for determining the 
accuracy of measurement. The mean or median absolute error may suffice, 
but, as in physical measurement, we sometimes need to know the extent of 
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the variable errors in relation to the magnitude of the thing being measured, 
i.e., the relative or percentage error. Psychologists have found it useful to 
interpret variable errors, not with regard to the magnitude (a nearly 
meaningless word in psychological tests) of the measures, but relative to 
the variability of the trait for a specific group of individuals. The correla- 
tion between two determinations is, as we shall soon see, one method of 
expressing the accuracy of measurement relative to the trait dispersion. 
Such a correlation is termed the reliability coefficient. 
Suppose 


X — an obtained score or measure for an individual 
X, — his true score 
e — a variable error, positive or negative 


Then we can consider that 


X=X, +e 
or in deviation units 
# = x, +e 
The variance of the obtained scores will be 
S?, = 5°, + 52, (10.12) 


provided we can assume =, and е uncorrelated. This assumption seems 
reasonable since the variable error, e, is supposed to be a chance affair, as 
often positive as negative, and therefore its magnitude and direction should 
not be related to anything else. Equation (10.12) can be stated in words: 
the variance of the distribution of scores can be broken up into two 
portions, the variance of the true scores and the variance due to errors of 
measurement. 

Suppose that for a given trait we have two measurements, each of which 
is in error but not necessarily to the same extent or in the same direction. 
Symbolically, 

zQ-—irde 

Ta = T, ey 
in which the es represent the errors which go with the two obtained scores. 
The reliability coefficient is defined as the correlation between two 
comparable measures of the same thing, i.e., the correlation between 9, 
and zy. (We need an 2, and =, for each measured individual.) Thus we 
have the reliability coefficient, 


Уул» = E(x, + ey, + e) 
NSS» NS,S, 
Er, + se, + Эле + Xe 
NS,S, 


Le uS БУ. 
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Dividing by N gives 
= 5°, + RSS t ге, + Тыа 


r 10.13 
їг SiS; ( ) 
If we assume all three rs in the numerator equal to zero, we have 
Yu ES m 
515 


It is assumed that we аге correlating comparable measures of the same 
thing or trait—comparable in the sense that 5, = S,, and S, = S). 
(The same trait is implied in that х; and =, are measures of x) Whence 
we have 

_ 5% 
“3 
where S, = S, = S, The reliability coefficient can be interpreted as a 
proportion, since from formula (10.12) we have 


(10.14) 


Trz 


i.e., the reliability coefficient represents the proportion of the variance 
of the obtained scores which is due to the variance of the true scores. It 
follows that 1 — r,, gives the proportion of the variance which is due to 


errors of measurement. 
Obviously, the reliability coefficient can, by substitution from formula 


(10.14) into the foregoing expression, also be written as 


sieta (10.15) 


which indicates clearly that the reliability coefficient is a function of the 
magnitude of the variable error relative to the variability of the trait in 
question. It also follows from formula (10.15) that the error of measure- 
ment can be stated in terms of the reliability coefficient and S,; thus, 


5, = S,V1 — Fe (10.16) 


That S, is to be interpreted as the standard error of measurement may 
be clarified if we note that, when x (= 2, or 23) is taken as evidence of the 
true score, ж — x, becomes the error, and the standard deviation of such 
errors will be S,, as can be shown by easy algebra. If it were possible to 
secure a large number of measures on an individual, we would expect these 
measures to distribute themselves normally about the true score with a 
standard deviation corresponding to S,. Thus, if the result of one testing 
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yields an IQ of 80, and if S, — 3, we can conclude with high confidence 
that the individual's true position, on the scale of measured (obtained) 
IQs, is somewhere between 71 and 89 (80 + 3S,). and with fair confidence 
that it is somewhere between 74 and 86. 

Determination of reliability. The foregoing argument regarding the 
interpretation of the reliability coefficient either as an indicator of relative 
accuracy or in terms of S, rests on the supposition that we have obtained 
the reliability coefficient as the result of correlating comparable measures 
of the same thing and that the variable errors are uncorrelated with them- 
selves and with the true scores. The practical determination of the 
reliability coefficient involves more, therefore, than the mere correlating of 
two sets of measurements. The conditions under which the two sets of 
scores are obtained must be scrutinized for possible violation of the 
requisite assumptions. Some of the difficulties involved in ascertaining 
the reliability of a psychological measurement are suggested in the 
following paragraphs. 

First let us note that the chance variable error, e, can be broken up into 
many smaller components, at least logically, although not necessarily 
experimentally. Thus we might set 


e=e +e te, tey ted: 
in which 
e, = error in the instrument or test 
e, = error due to extraneous physical disturbance 
€, = error due to physiological condition of individual 
€, = error in scoring or in reading instrument 
е, = error due to day-to-day fluctuations 


Other sources of variable error might be added, or some of those listed 
might be broken up into more minute parts. It is not assumed that these 
several sources contribute an equal amount to the variance of e, nor is it 
assumed that these several components are entirely independent of each 
other. For instance, daily fluctuations might be influenced by physiological 
condition. 

The assumption of uncorrelated errors implies that e, is not correlated 
with e,. Of course the two scores for an individual might by chance contain 
a variable error of the same magnitude and sign; we are here interested, 
however, in whether an error which is chance for one score might tend in 
general to affect the second score in the same manner. For example, an 
upset stomach might lead to a reduced performance score, and if the 
second test was administered the same day, this same chance factor would 
affect the second performance score in the same direction. Thus in examin- 
ing any proposed scheme for determining the reliability of a test, we must 
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inquire as to whether any of the sources of error can affect the two measure- 
ments on an individual in the same direction. If it seems reasonable to 
suspect that errors are correlated, it follows that the obtained reliability 
coefficient will be spuriously high since the presence of correlated errors 
will not allow formula (10.13) to be reduced to (10.14). 

The presence of e, as a source of error may be appreciated if we regard 
the items as providing a sampling of the performance (or responses) of 
the individual. Consider the simple problem of measuring vocabulary 
level. We can easily conceive of the individual's true score as the pro- 
portion, рь, of words in Webster's Unabridged Dictionary that he can 
satisfactorily define. Instead of the time-consuming tedium of asking him 
to define each word, we might resort to sampling. We could get a fairly 
good sample of, say, л = 30 words by taking the fourth word from the top 
of the right-hand column of every page ending in the numeral 55. The 
standard error of his score would be given by V p,q,/30, or more gener- 
ally by Vpaqdn. Even though p, is unknown (estimable as p,,), it is readily 
seen that the larger the number of items the smaller the error, and vice 
versa. Thus to reduce e,, as instrumental error, the length of the test is 
increased. This general principle holds even though our vocabulary 
illustration is not quite analogous to what goes on in test construction 
because a “universe” of items is rarely, if ever, available for sampling and 
because test constructors tend to improve on randomness by selecting 
items that possess certain desirable characteristics. 

Let us consider a few of the "accepted" schemes for ascertaining 
reliability in order to see whether they are “acceptable” in light of the 
assumptions requisite to a sound reliability coefficient. These assumptions 
may be recapitulated in the form of three questions. Do the two tests or 
determinations represent measures of the same thing? Are the two series 
of measures comparable (comparable tests or instruments)? Is it possible 
or likely that the errors of measurement are correlated; i.e., can the error 
on the first test be correlated with the error on the second, or can the error 
on either be correlated with the true measure? 

For the ordinary mental, personality, or achievement test, reliability is 
usually ascertained by correlating supposedly equivalent (comparable?) 
forms, by correlating split halves (odd vs. even items or first half vs. 
second half of test), or by correlating test-retest scores. The test-retest 
method is of limited value in that there may be a memory carry-over from 
test to retest, in which case the retest will measure the same trait as the 
original test plus memory effects. In order to overcome this memory 
transfer, the retest may be administered some months after the first test, 
but this permits of a possible change in the trait or ability as a result of 


maturation or experience. 
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Split-half reliability involves the correlating of two halves and applying 
the Brown-Spearman formula to determine the reliability of scores based 
on the whole test. This formula is easily derived. Let X, and X, stand for 
the respective halves. Now rj, would be the reliability for scores based on 
either half, but in practice we always use total scores, defined as X, 
= X, + X. The reliability of X, can be thought of in terms of the cor- 
relation between X, and an imaginary set of comparable scores, X, 
= X; X, where X, and Y, are scores on the two respective halves of a 
nonexistent form of the test. Given information about X, and АХ», we seek 
an expression for r,,. In deviation units, x, = x, + z, and d Esa dw. 
hence we may write 
Ax, D(a + Ty)(ca + 24) 

NS,S, NS,S, 
Улаз + Хауа + Etta + aye, 


NS,S, 


Dividing through by N and utilizing formula (8.1), and with formula (9.9) 
as a basis for specifying S, and 5,, we have 


г135153 + ri S18, + 35253 + 794558, 

У5% + 5% + 2гуз$у$»\/ S* + 5 + 231535; 

Now it is assumed that the X, and X, scores are comparable (equivalent 
sets, with S, = 5,), and we simply say that our imaginary scores, Y, and 
Xa are comparable with each other and also with X; and X,; hence all 
four Ss have the same value, and therefore cancel out, leaving 
а + ru reg + ra 

V2 + 2ria/2 + 2r, 

Comparable or equivalent sets of scores will correlate equally with each 


other, that is, the five unknown rs in this expression will all equal г», our 
known value. Therefore we have 


Tay = 


Гау 


Tab 


= Aris ENT 

V2 + 2.4/2 + 2r 1+ Fia 
as the reliability of scores based on the whole test. 

The only assumption underlying formula (10.17) is that the two halves 
being correlated are comparable (equivalent or parallel). If the test items 
have been arranged according to difficulty, a first-half vs. second-half relia- 
bility will not satisfy the notion of comparable measures. Ordinarily the 
odd-even item technique will satisfy the criteria of comparability and 


sameness of trait. Neither of the split-half methods will satisfy the assump- 
tion of uncorrelated errors. Since both measures are determined at the 


r. 


(10.17) 


zz = Tay 
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same sitting, any chance fluctuations due to physiological conditions or to 
chance factors in the test situation will influence the two scores of an 
individual in the same direction. It is to be expected, therefore, that the 
correlation of halves will in general lead to a reliability coefficient which is 
too high, giving us an exaggerated notion of the accuracy with which we can 
place an individual on the trait continuum. 

By far the best method for determining the reliability of a test is to have 
two forms which have been made equivalent and comparable by careful 
selection and balancing of items. No item in one form should be so nearly 
identical with an item in the other form as to permit a direct memory 
transfer. Two forms, equivalent yet not identical, can be administered 
within, say, 2 weeks’ time—a procedure which properly includes in the 
estimate of variable error the daily fluctuations due to either physiological 
or psychological conditions and variations due to chance factors in the 
physical situation in which the tests are given. With so short an interval 
between testings, the trait being measured will have changed only a negli- 
gible amountasa result of maturation or ordinary environmental influences. 

The form versus form method for calculating reliability may reflect two 
major sources of unreliability: instrumental error and trait instability. 
If it is claimed that № indicates a person's position on a scale and if we 
wish to know something of the precision of the score, we should so deter- 
mine 5, as to include both major sources of error. High precision for a 
score earned today is a necessary but not suflicient condition for score 
stability; if the day-to-day variation happens to be large we can not have a 
very dependable score—its lack of dependability associated with the 
accident of measurement on a particular day should be incorporated in S,. 

When we attempt to obtain the reliability of a learning score or of any 
performance which is influenced by practice, we encounter difficulties 
which are baffling to the researcher who rigorously adheres to the funda- 
mental requisites of the reliability coefficient. The chief difficulty is the 
obvious fact that the “thing” being measured changes as a result of each 
measurement or trial. Test-retest, or first half vs. second half (of trials), 
or today’s trials vs. tomorrow’s will not represent measures of the same 
function, nor will any scheme analogous to equivalent forms avoid this 
difficulty, since forms" which are comparable will permit transfer. The 
use of scores on odd vs. even trials will have the advantage of balancing 
somewhat the influence of practice, especially if several trials are given; 
but the possibility that a chance error affects odds and evens alike is present, 
inthata slip in the experimental procedure or a temporary discouragement 
on the part of the testee or the adoption by the subject of a poor approach 
to the problem will have a similar effect on both scores. If trials were 
Spaced, say, a day apart, the factors just mentioned might not greatly 
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disturb the reliability determination. In general, it can be said that the 
odd-even trial method will yield a reliability coefficient which is higher than 
the “true” reliability. 

The same shortcomings are present in the aforementioned methods when 
they are employed in determining the reliability of animal (or human) 
maze-learning scores. Other techniques, peculiar to the maze situation, 
have been proposed. Performances on the odd and even blinds, somewhat 
similar to odd and even items, have been correlated for the purpose of 
reliability, but since blinds differ considerably as regards difficulty, we 
cannot be sure that the two halves are comparable. We can also question 
the comparability of the first half and second half of the maze, since in 
general the last part tends to be learned more quickly than the first. 
Attempts to ascertain the reliability of one maze by correlating perform- 
ance on it with that on another maze involve several difficulties. In the 
first place, there seems to be a general positive transfer (perhaps a general 
adaptation to the maze situation) from a first to a second maze; secondly, 
the second maze must be similar to the first in order to satisfy the requisite 
of comparable measures of the same ability, but if this similarity ap- 
proaches identity the second maze becomes a retest; and thirdly, a close 
degree of similarity will lead to possible interference effects which may 
act differentially from animal to animal. 

The foregoing brief discussion of the requisites for and difficulties in 
arriving at a meaningful reliability coefficient should make obvious the 
necessity for examining critically any proposed method of determining the 
reliability of a psychological measurement. The interpretation of the 
reliability coefficient in terms of the standard error of measurement 
definitely assumes homoscedasticity, which is another way of saying that 
the reliability coefficient is valid only when the error of measurement is of 
the same order of magnitude for the entire range of scores. That this may 
not always hold true is evident from findings with the 1937 Stanford 
Revision of the Binet Test. 

It should be noted that the magnitude of the reliability coefficient is 
influenced by the trait homogeneity of the sample on which it is based. 
Let sd represent the standard deviation for the restricted range, SD the 
standard deviation for the unrestricted range, r,, the reliability for the 
restricted, and R,, the reliability for the unrestricted. If we may assume 
that S, for the smaller range equals S, for the larger range, we may write 


(54 — т) = (SD) — Rpa) (10.18) 


as a formula from which we can infer r,, from R,,, and vice versa. The 
more homogeneous the group, the lower the reliability coefficient. 
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Attenuation. Now we return to the question which led to this lengthy 
detour: How does unreliability affect the correlation between variables? 


Let 
z=2,+e 
y=y td 


where e and d represent the variable errors in the two scores, a andy. Then 
r = Zle + ey + d) 
i NS,S, 
_ Day, + ad + Lye + Led 
NS,S, 
If we assume that d is uncorrelated with x, that e is uncorrelated with y,, 
and that e and d are uncorrelated, we have 


_ Уаш NL ET 


me NSS, SS, 


Since, in general, S, = SV rox by formula (10.14), we have 


a m 
N= тиу агуу Twy (г = r between true scores) 
S.S, 


т, = Ta Tes Toy (10.19) 


which, since the reliability coefficients are less than unity, shows clearly 
that the correlation between obtained scores will be less than thecorrelation 
between true scores; i.e., errors of measurement tend to reduce or attenu- 


ate the correlation between traits. 
We can rearrange formula (10.19) as 


BU == 10.20 
Tu “Уты ( ) 


by which we can estimate what the correlation would be if perfect, 
errorless, measures were available. This is known as correction for 
attenuation. Correlation coefficients corrected for attenuation are of 
theoretical importance in the analysis of relationships in that allowance 
can be made for variable errors of measurement, but such corrected rs are 
of little practical value since they cannot be used in prediction equations. 
The prediction of one variable from another and the accompanying error 
of estimate must necessarily be based on obtained, or fallible, rather than 


true scores. 
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Since the correlation between variables is a function of the reliability 
of their measurement, we may examine the limits imposed on r as a result 
of fallible scores. By reference to formula (10.19), we observe that, if the 
correlation between true scores is unity and if the reliability for one 
variable is perfect, the obtained correlation between the two cannot exceed 
the square root of the reliability coefficient for the other variable. If the 
correlation between the true scores is perfect, and if each variable is subject 
to errors of measurement, the obtained correlation cannot exceed the 
product of the square roots of the two reliability coefficients. Obviously, 
if the reliabilities are the same, the obtained correlation cannot be greater 
than the reliability coefficient. 

In addition to the assumptions which were made specifically in deriving 
the formula for correcting for attenuation, it is also necessary to meet all 
the assumptions required for a sound reliability coefficient. Since obtained 
correlations and also reliability coefficients are functions of the homo- 
geneity, with respect to the two traits, of the sample on which they are 
based, it follows that the reliability coefficients used in correcting an 
obtained r should be based on the same sample as r or on a sample which 
is of comparable homogeneity. Corrected rs greatly in excess of unity 
have been reported. Such absurd results lead us to ask whether the 
assumptions have been met, but this question should be raised concerning 
any corrected r, even though it does not exceed unity, since the assump- 
tions are difficult to meet. It has been said that a corrected r can legiti- 
mately exceed unity by as much as 2 or 3 times its sampling error. 
Formulas for the standard error of a corrected r are available, but nothing 
is known concerning the nature of the distribution of corrected rs for 
successive samples. Presumably this distribution would be markedly 
skewed for high values; hence the use of an ordinary standard error 
technique to determine whether a corrected r exceeds unity (or any other 
magnitude) by more than can reasonably be expected on the basis of 
sampling is an unsound procedure. 

Measurement error and comparison of means. Will the presence of 
errors of measurement affect the large sample, z, and the small sample, f, 
tests for the difference between means? It seems reasonable to presume 
that Dj, = M; — М» would not be systematically affected by measure- 
ment errors, because positive and negative errors would tend to balance so 
that each M, would tend to equal the M,, for the sample. The value of 
Sp, OF Sp, for independent means will be increased by errors of measure- 
ment because score variance is increased by such errors [see formula 
(10.12)]. For correlated means, Sj, and з зу, will be increased by measure- 
ment errors because the variance of the difference scores is increased by 
unreliability. Thus for either independent or correlated means, the z and г 
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are systematically reduced by errors of measurement. Moral: the use of 
unreliable measures is not conducive to the finding of significant differ- 
ences. 

Slope and measurement errors. The slope for the regression of Y on 
X, B,» = r(S,/S,) can be written in terms of the correlation between true 
scores and the Ss for true scores by utilizing (10.20) and (10.14). Thus 


ар 50 
5 


qoum 


"zu 5, № 
Bye = AL 
Tro Туу a "rz 


r 


from which we see that the slope in terms of true scores is larger than 
the slope based on obtained scores; that is, the slope is reduced by errors 
in Y. Note that the errors in Y do not affect the slope: this is reasonable 
because for a fixed value of Х (or for an interval on X), the average value 
of Y will involve a balancing of the chance errors in the Ys, that is, the 
means of the vertical arrays will not be systematically affected by the 
errors in the Ys. Therefore the slope of the line "fitting" the array means 
will not be systematically affected by the Y errors. For those primarily 
interested in slopes, it would seem unimportant to have high reliability for 
Y, but it must be noted that the sampling variance of B,, will be increased 
by errors in Y via an increase in the residual variance. 

Reliability of difference scores. There are three situations for which 
we may wish to say whether two scores differ more than expected on the 
basis of errors of measurement. First, two persons each with a score on a 
given test. Second, a change score based on an "initial" and a “final” 
score fora person. Third, the difference score for a person on two different 


tests. 
For the first situation, the standard error of the difference is given by 


VS + 5%, = S,V2 


A difference would need to be approximately 25, 2 to be significant at 
the .05 level. 

For the second situation, let us be a bit unconventional in terminology 
by letting Y stand for an initial score and Y stand for a second score, on 
the same variable, taken after an experience or time interval that might 
produce a change. (Ordinarily, we let X, and X, or Х, and X, stand for 
such scores; the Y and X notation will have certain conveniences in the 
sequel. Thus the change score is given by D = X — Y or, in deviation 
units, as d = x — y. Although Y and X are based on the same instrument 
ог test, the second score, or X, might be regarded as measuring the 


same thing as Y plus the experimentally produced effect, variable over 


individuals. 
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Let the error in D be represented by 
e, — d — d, 


= (x — y) — (=, — у) 
(z — х) — (у — у) 


or 
ёа = е, — e, 


Assuming that the errors in X and Y are uncorrelated, we have by the 
variance theorem, 


$% = 5°, + s = S*(I-— rj) + S*(1 — ry) (10.21) 


the square root of which will yield the standard error (of measurement) for 
the change score. In order to determine the reliability coefficient for change 
(difference) scores, we may utilize (10.15) with a shift to subscripts appro- 
priate to the present problem: 

S. 

5°, 

in which 5°, is the variance of difference (change) scores and is given (see 
p. 81) by 


Таа = 1 — 


5°, = 5°, + 5°, — 2r, 8,8, 

Substituting, we get 
5°.(1 = Taz) + SU zx rw) 

5°, E 5°, 5 2r,,S,S, 

An estimate of r,, based solely on the N cases in an investigation 
involving change scores poses a problem: how ascertain the needed г„„ 
and r,, values? Apparently, the most feasible procedure would be to use 
the odd-even, Spearman-Brown, method for calculating each value. We 
might, however, secure a fair approximation of r,, when the scores аге 
based on a standardized test or a scale of known reliability provided it 
seems safe to assume that the error of measurement variance is the same for 
the М cases at hand as that for the group on which the known reliability 
coefficient was calculated and provided it can be assumed that the error of 
measurement variances for the first and second set of scores on our N 
cases are equal. The first assumption is tenable unless our N cases are 
highly atypical compared with those in the group yielding the known 
reliability coefficient. (Note that we need to assume the equivalence 
of two S, values for this first assumption, not the equivalence of two 
reliability coefficients. Why?) The second assumption would be question- 
able if the imposed experimental condition led to drastic changes (unlikely 
in most investigations). The approximation of г would be obtained by 


(10.22) 


та = 1 — 
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replacing each of the numerator terms by the already available S?, of the 
test. Since the odd-even method is less fraught with assumptions, its use is 
preferable. 
It is of interest to simplify (10.22) to 
Tee's + Doy — 2r,,S,S, 
S*, + S*, — 2rayS2S, 
and then note what happens when S, = S, (i.e., when no change in varia- 
tion occurs). The Ss cancel, giving 


(10.23) 


Гаа = 


m Toz + Ту, — Pay (10.24) 

2 — 2r, 
Under these conditions r,, could very well equal r,,, so that we would have 
та = EIL (10.25) 

Li Ty 


which makes it quite apparent that as r,, approaches r,, the value of rj; 
approaches zero, a proposition that also tends to hold for ra by way of 
(10.23) and its exact equivalent, (10.22). This means that for change 
scores to be reliable the experimentally produced effect must lead to a 
shift in the ordering of individuals (regardless of the presence or absence 
of an over-all mean change). With an r,, of, say, .90 and r,, of, say, .80, 
the value of ге by (10.25) is only .50. 

Although the foregoing was couched in terms of experimentally pro- 
duced changes, it is obvious that the same deductions hold for long-time 
changes in longitudinal studies. For either case the reliability of change 
scores may be surprisingly low despite high reliability for initial and final 
scores. The most serious consequence occurs when changes on one 
variable are being correlated either with changes on another variable or 
with scores on some other variable: the rs will be attenuated, sometimes 
so much as to make it difficult to obtain statistically significant rs. 

The (usual) unreliability of change scores poses a paradox when the 
question of the significance of the mean, or over-all, change is considered. 
Suppose S,, = S, and suppose the mean change is appreciable, say half of 
S in magnitude, and suppose further that r,, is very nearly equal to r,, 
with a consequent гш, of near zero. How could a mean change based on 
changes so lacking in reliability possess statistical significance? In 
answering this, it should be noted that two things happen as r,, approaches 
Ty, as its limit: not only do the change scores become more unreliable 
but also the standard error of the difference (= standard error of the 
mean change) is progressively reduced by the increasing r,, in the standard 
error of the difference between initial and final means. Such an occurrence 
— very high r,, and a substantial mean change—is highly academic because 
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produced changes are usually different from person to person rather than 
nearly constant over persons as implied by a very high г„,. 

The third situation for which we may need and in fact should have 
information regarding the reliability of difference scores occurs when for 
each of several persons we have a difference between scores on two different 
tests, or variables. For such difference scores to have meaning, the two 
sets of scores should be in comparable units such as standard scores or 
T scores, with equal means and Ss. Under these conditions the reliability 
of difference scores will be given by (10.24) with the subscripts referring 
to the two different variables, or tests. This time it is high correlation 
between the two variables that tends to lead to unreliable difference 
scores even though r,, and r,, are satisfactorily high. Again, the unreli- 
ability will limit the degree of correlation of such difference scores with 
other differences or with other variables. The instability of difference 
Scores would seem to limit their possible usefulness in diagnostic work or 
in guidance programs. But it should be noted that even though difference 
Scores may not provide a very reliable basis for differentiating among 
individuals, a difference for a particular individual may be dependable 
provided it is sufficiently large, say 1.96S,,, for which S,, would be 
obtained as the square root of the middle or right-hand part of (10.21), 
with the components in standard score form. Needless to say, even large 
differences may not have diagnostic or guidance significance—empirical 
study is needed to demonstrate their value. 

Measurement errors and a regression phenomenon. Suppose we have 
the scatterplot for the scores on two comparable forms of a test for М 
persons. With a form versus form correlation, or reliability coefficient, of, 
say, .85 the regression line for the second form score on the first score (or 
the line one might use to predict X, from X,) will have a slope of .85 
(assumes S, = S), which means, of course, that those initially below 
average and those initially above average have “‘regressed” toward the 
mean on the second testing. That is, there would seem to be a tendency for 
the initially low to gain and the initially high to lose, which implies a 
negative correlation between initial score and gain. 

Let's take an algebraic look at the situation. 

Let X, = initial score 
X, — final score 
X, = X,— Х = x, — 2, = v, = gain 


Then 
" Lat, _ Lale — а) Ўл, — Ха?, 
NSIS 5,5, 5,5, 
8:5, — 5% _ уб, — Si (10.26) 


* 5,5, VS? TS, — 2r4S,8, 
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as a general formula for the correlation of gain with initial score. No gain 
Scores ever need to be calculated to obtain this correlation. For the special 
case where initial and final are simply based on two comparable forms 
with a short interval between the two testings, the Ss will be so nearly 
equal that they cancel out, leaving 


rj—1 
Ty = "RE (10.27) 
у= er 


from which it is quite evident that the lower the value of r; (that is, the 
lower the form versus form reliability) the greater the negative correlation 
between gain and initial score. If r,» = .85, the value of r;, becomes —.27, 
sufficiently high to be statistically significant at the .01 level if N is in excess 
of 90. But this "significant" r has been produced by nothing more than 
errors of measurement—no real gain for the initially low or real loss for 
the initially high. 

More generally, it is seen from (10.26) that in follow-up studies involving 
test (initial) and retest (final) scores there will be a negative correlation 
between gain and initial score unless 5, has increased appreciably over S;. 
This negative correlation is produced in part by the attenuating effect of 
unreliability on r;, and in part by whatever factors contribute to real 
differential changes from initial to final. Before concluding that gain and 
initial status are really correlated, we need to get rid of that part caused by 
errors of measurement; that is, we need a correction for the regression that 
occurs solely on the basis of unreliability. This can be achieved provided 
we have an r,, that holds for the group at hand [say an odd-even Brown- 
Spearman estimate or the best (in sampling sense) available reliability 
coefficient adjusted for difference in variances for the two groups, the 
adjustment by way of formula (10.18)]. The calculated r,, would represent 
an estimate of гу», the correlation between the first and second test under 
the condition of no changes taking place other than those attributable to 
measurement errors. It would be presumed that with no real changes 
occurring, Są would tend to equal S}, hence the prediction equation in 
deviation units would be 2’, = голу = 7,23, OT IN Taw scores, 


Хо = rz Xy Ma — ri Mi 


with the unknown M, taken as equal to M,. Then the gain would be taken 
as the actual Y, minus the predicted X, given by the foregoing X^. We 
are taking X’, as given by X^, as our best regression estimate of what the 
final score would be in case there were no change-producing conditions 
intervening between the testings. Actually X’, may be regarded as an 
initial score adjusted so as to eliminate that part of the regression of 
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final on initial score that is produced by measurement errors. A little 
algebra will help at this point. 

In terms of deviation units we have z;, x,, 2’, = r,t; and g = x, — s, 
= £; — Гыл, hence the correlation between the adjusted initial scores, 
Га, and gains from the adjusted initial scores becomes 


ыы Ex'g - Хк, — гы) 
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which, as a computational formula, gives the desired adjusted correlation 
without the tedium of first calculating NX’, values and N differences, and 
then a correlation coefficient. Note from (10.28) that when S, = S; 
and when r;, = r,, the adjusted correlation between gain and initial 
becomes zero. Contrast this with the artifactual r;, given by (10.16) for 
the same conditions. 

Another approach to the elimination of the effect of errors of measure- 
ment on the correlation between initial and gain is to work with so-called 
regressed scores, a concept which we will need in another connection. 
Consider the problem of estimating a true score, z,, from an obtained 
score, x. We can write a prediction equation as 


m mnm (10.29) 
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As earlier, it is assumed that x, and e are uncorrelated, so we have 
Tuas 5, = 5 
SS. S, 
Also from (10.14) we have S, = S, Vr. 
Substituting in (10.29), e 
(= Jig Sed E= ЖЕ. (10.30) 


which from (10.14) = \/r,, 
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Note that the estimate, x’, is identical to x, estimated from 2, or to ж, 
estimated from z,, both x's and 2", being estimates under the condition of 
no changes except those due to errors of measurement. We see now that in 
saying that x’, or x’, may be regarded as a sort of adjusted value for the 
first or initial score, we are in effect so adjusting the initial score as to 
yield an estimate of the true score. The z', values are referred to as 
regressed scores. Obviously, the utilization of regressed initial scores as a 
basis for determining whether or not gains are correlated with initial 
(regressed) scores will lead to (10.28). Conceptually, it may seem prefer- 
able to think in terms of z', as an adjusted initial score that makes allow- 
ance for the error of measurement part of the regression of final on initial 
standing. 

A matched group fallacy. Occasionally, in the comparison of group 
changes, either long-term or experimentally produced, we may be dealing 
with samples from two populations that are known to differ appreciably ; 
that is, random sampling will yield groups that will differ in initial score 
level. In order to have groups with comparable initial standing, cases are 
paired, a procedure which is ordinarily desirable but which for this given 
situation introduces a difficulty in that the matching will involve pairing 
Some persons from the top half of one population with persons from the 
bottom half of the other population. Upon subsequent testing and without 
any interpolated change-producing experience, one group will show gain 
Whereas the other group will show loss, but this difference in change 
represents nothing more than the regression of scores in each group 
toward the respective mean of the larger group from which it was drawn. 
And of course, with change-producing conditions this type of regression 
due to errors of measurement will contaminate (either increase or decrease) 
any real difference in change between the two groups. | 

A second type of situation in which matching may be disrupted by 
measurement errors occurs when pairing is used to obtain two groups that 
are comparable on some variable, Y, which should be controlled in making 
à comparison on variable X. If the matching on Y involves pairing a 
person from the upper-half of one supply group with a person from the 
lower-half of the other supply group, the two matched samples, so set up, 
will tend to have equal means for Y, but will nevertheless differ on the 
Y variable because of the errors in the Y scores used in pairing. An 
immediate retest on Y will show that those from the top part of one 
“population” (supply) will average lower on Y, whereas those from the 
bottom portion of the other supply will average higher on Y, than 
originally. E 

For either of these situations—control on initial score or on another 
variable—the failure to obtain really comparable groups will occur even 
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though not all pairs involve top-half versus bottom-half status. The 
larger the percentage of such pairs, the greater the disruption. For both 
situations, the trouble can be avoided by pairing on the basis of regressed 
scores. Separate regression equations of the form 2", = r,,v; (or у, 
= гу) will be needed for each group from which the (matched) samples 
are to be formed, with the value of r,, (or r,,) taken as the reliability most 
appropriate for the particular group. The calculation of the regressed 
score is facilitated by casting the regression equation into raw score form: 
X^, erus X by Figg. 


INDEX CORRELATION 


A possible source of error in correlational work may be introduced when 
two indexes having a common variable denominator are correlated, such 
as X/Z and Y/Z. Before considering this special case, it might be well to 
turn our attention to more general formulas for indexes. These formulas 
involve the coefficient of variation, namely, v = S/M, and their use leads 
to serious error when the vs are large—v and higher-power terms having 
been dropped in the derivations. 

Let J = ХХ; then it can be shown that the mean and standard 
deviation of such an index or ratio will be approximately 


M 2 
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If we have four variables, the following formula for the correlation of 
indexes will yield a good approximation: 
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Although these formulas are very useful, their use is somewhat limited in 
that generally we cannot know whether the index distribution is normal, 
nor can we make a statement concerning linearity and homoscedasticity 
for the correlation between two indexes. Such information, if needed, 
must be obtained by first determining the numerical value of the indexes for 
each individual and then making distributions. 

Several special cases can be deduced from formula (10.33). Thus the 
correlation between X,/ X; and X, is exactly equivalent to that between 
X,/X, and X,/1; i.e., X, is set equal to 1, which makes v, = 0, and there- 
fore all terms involving the subscript 4 vanish. The correlation between 
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X,/ Xand the reciprocal ofa variable would be obtained by setting X, = 1, 
i.e., letting 1/ X; be the reciprocal; then гь = 0, whence the desired formula 
can be obtained by dropping all terms involving və. Likewise the correla- 
tion can be deduced for 1/ X, with 1/X,, for 1/X with А, and for X;/ Y; 
with X,/X,, This last correlation is of particular interest because it is 
possible to find a relationship between these two indexes even though the 
three original variables are uncorrelated. 

By substituting X; for X,, i.e., replacing subscript 4 by 3, an expression 
for the correlation of indexes having a common variable denominator can 
readily be obtained. It will be 


2 
тәри — l'igUjUs — 130203 + 23 (10.34) 
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If rjj = ry = Рз = 0, this becomes 


NIE 


and if the rs are equal, the value of the index correlation will be .50 even 
though there is no relationship between the original variables. This is 
known as spurious correlation due to indexes. There are instances, how- 
ever, in which an analysis of the interrelations of ratios is of just as much 
import as the analysis of the variables from which the indexes are obtained, 
and therefore it does not follow that the correlation between ratios having 


à common denominator is necessarily misleading. 
It has been asserted that the correlation between IQs derived from two 


tests or two forms of the same test will be spuriously high because of the 
common variable denominator, age. It can be shown, however, that such 
a correlation will not be spurious unless the two sets of IQs are correlated 
with age. If the IQ-vs.-age correlations are both positive or both negative, 
the index correlation will be spuriously high; if one is negative and the 
other positive, spuriously low. Thus, rather than make a blanket statement 
to the effect that the correlation between IQs is spuriously high, we should 
say that it can be spuriously high or low or not spurious at all, according 
to the IQ-vs.-age correlations. It should be remembered that, even though 
the IQs based on an ideal (properly constructed and standardized) test 
will be uncorrelated with age, a nonzero relationship might be produced 
for a single school-grade group by the selective factors that operate in 
age-grade location. Within a single grade group in a school system where 
acceleration is permitted, the younger children are likely to be the brighter, 
i.e., have the higher IQs, thus producing negative correlations for sets of 


IQs with age, and consequently a spuriously high correlation between IQs, 
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PART-WHOLE CORRELATION 


Another type of spurious correlation arises when a total score is corre- 
lated with a subscore which is a part ofthe total score. Suppose that a total 
score is made up of three parts, X; = Х + X,-- Y, and that we 
correlate X, against Хт. Ordinarily in such situations the components will 
themselves be correlated positively. It should be obvious that the extent 
to which X, correlates with X is more or less dependent on the fact that 
Хт includes Xy. It does not follow, however, that a high value for гүү is 
not meaningful, even though spurious. For instance, a high value for гут 
would, regardless of spuriousness, justify the use of X, in lieu of the 
battery of three subtests. There are times when we may wish to know how 
highly a subtest correlates with a total, based on any number of parts, 


minus the subtest. This correlation is given by 
ттт — S. 
Түт-у = ERE (10.35) 
М5 + 5% — 21152 


HETEROGENEITY WITH RESPECT TO A 
THIRD VARIABLE 


We have already discussed the influence on r of heterogeneity with 
regard to one or both the variables being correlated. Suppose variables X 
and Х are two different traits, each of which is related to age as the third 
variable. Then an older individual will tend to be higher on both tests than 
a younger individual. In other words, heterogeneity with respect to age will 
tend to produce correlation between X; and Хз, and our present problem is 
to develop a method for correcting гу so that we can estimate what the 
correlation between Х and X, would be if age were constant. 

Suppose 7,5, гуз, әз, and the several means and standard deviations are 
known; then let us visualize the three scatter diagrams. The scatter for rj» 
will be somewhat elongated as a result of the influence of age, since 
variation in both X, and X, are here supposed to be partly due to age 
variation. What is needed is the correlation, between measures of X; and 
Xs, which has been freed from the influence of age. If we were to express 
each X, in the first array of the scatter for гуз as a deviation from the mean 
of this array and were to do the same for all other Xs in the scatter—each 
as a deviation from the mean of the array in which it falls—we would have 
scores expressed as deviations from the means of the several ages. These 
deviations will be independent of age. As an example, suppose an 8-year- 
old individual scores 28 and the mean of 8-year-olds is 25, and a 14-year- 
old individual scores 54 and the mean of 14-year-olds is 51. The second 
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individual scores higher than the first because he is older, but each would 
have a deviation (from his own age mean) of plus 3. Obviously, if we also 
expressed the X, scores as deviations from the averages for the several 
ages, they too would be independent of age influences. Now, if we corre- 
lated these deviations (from age means) we would be correlating sets of X, 
and X, scores which would be free from age, and hence we would arrive at 
a correlation, between variables X, and X;, which would not be affected 
by age heterogeneity. 

Partial correlation. The task of determining the correlation between 
two variables, with the influence of a third eliminated, can always be 
accomplished by actually computing all the deviations and then making 
a scatter diagram from which the r can be determined. However, in those 
cases in which we can assume linearity of regression for X; on Хзапа X, on 
X, it is possible to set up a method for determining the desired correlation 
from the three correlation coefficients between the three variables. If 
linearity exists, we can correlate the deviations from the two regression 
lines instead of from the array means (or means for several ages if age is the 


third variable). Since 


S S. 
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the two sets of deviation-from-regression scores will be 
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The correlation of these deviation scores, which is designated by the 
symbol ryp. (read: the correlation between X; and X, with Хз held con- 
stant) and known as the partial correlation coefficient, becomes 
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Multiplying and summing the numerator, and noting that the Ss in the 
denominator are nothing more than the errors of estimate, $,. and Sy.5, 


we have 
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Dividing by N, cancelling Ss, and collecting like terms, we get 


nes = Ss (10.36) 
vi- V1 — rs 


This formula definitely assumes the linearity of the two regression lines 
for predicting X; and X, from Хз. Whether we correlate deviations from 
array means or use formula (10.36), we end with a correlation which has 
been freed of the influence of the third, or eliminated, variable. If, for 
example, age is the third variable, the partial correlation coefficient 
represents an estimate of what the correlation would be if we held age 
constant by the use of individuals of any one of the several age levels 
present in the original group. 

The difference between гу». and rj; indicates how much of the correlation 
between variables 1 and 2 is due to the influence of heterogeneity of a third 
variable. Obviously, if the third variable is unrelated to X; and X,, the 
partial r will equal rj», and if either гу, or rs; is negative and rj» positive, 
“partialing out" Хз will raise the correlation. Is this reasonable? 

The difficulties encountered in determining the direction of causation 
make it necessary to be careful in the use of the partial correlation tech- 
nique. When it is said that heterogeneity with respect to a third variable 
(Хз) has in part (or entirely) produced correlation between X; and X;, we 
must ask how the influence of X, comes about. Now if it can be argued that 
variation in Хз is a cause of variation in X, and X, it is readily seen that 
гә is at least in part attributable to the fact that Y, and X, have a common 
source of variation. The partial, rj, tells us the degree of correlation 
between X, and X, which would exist provided variation іп Хз were 
controlled. But if it cannot be claimed that X, produces variation in X, 
and X,, the interpretation of the partial r is far from clear. Suppose X, 
precedes Хз in a temporal sense so that we know variation on Хз couldn't 
possibly contribute to variation in Xj. Does it make sense to interpret 55. 
as the correlation between X, and X, with the influence of Y, nullified when 
we know that X could not influence X,? Stated differently, the only. way 
that Хз can produce or contribute to the correlation between X, and X; is 
by way of Хз producing variation in X; and X. 

The technique can be extended for “‘partialing out" or eliminating more 
than one variable. Thus, to obtain an estimate of rjj with X; and X, held 
constant, we can use 


"12-3 — l'1437943 
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which is in terms of first-order partials calculable by formula (10.36). 
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The sampling error of the partial coefficient may be handled by the z 
transformation. The standard error of the corresponding z will be 
INN — 4 when only one variable has been eliminated, and ЦУМ —5 


when two variables have been eliminated. 
The partial correlation coefficient based on a small sample can also be 
tested for significance by the technique. If one variable has been elimi- 


nated, we have 


with df= № — 3. An additional degree of freedom is lost for each 
additional variable eliminated. 

A perplexing and often-recurring question with regard to the inter- 
relations of three variables is this: Are the correlations consistent among 
themselves, or, if rg and гуз are known, what are the possible limits for лз? 
If rj, = unity and rı = unity, ља must also equal unity, but, if rı = 0 
and гы = 0, does it follow that rs; = 0? It can be shown that the limits for 


the correlation rs, will always be гл + V1 — Pip — r^y + rrt. 


EXAMPLES: 


When rj. and rj; each equal .90, the limits for rə are 4-.62 and 4- 1.00; 
Toe ч ЧА Ж Se “ * * * L50 and + 1.00; 


m « 25, н “ “© * 875 and + 1.00. 


Part correlation. There are times when we may wish to have the 
correlation between variables Y, and X, with the influence of X5*tremoved"* 
from X, only. For example, we may wish to calculate the correlation 
between intelligence and incidental memory with general memory partialled 
out of the incidental memory; or we may wonder what the correlation is 
between reading ability and academic achievement with the influence of 
college aptitude “taken out" of academic achievement; or we may wish to 
determine the correlation between intelligence and a set of final scores, 
obtained after extensive practice, with initial level partialled out of the 
final variance. 

In symbols, if we seek the correlation between X, and X; with X, 
partialled out of X5, we would in effect be correlating X, with the residual, 
Х,а. From the derivation of 7,5 it can be easily deduced that an appro- 


priate formula is 


Tj» — T1323 (10.37) 
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which is referred to as a part correlation coefficient. The part correlation 
method will, of course, be most useful when it can be argued that Хз causes 
variation in X; but not in Xj. 


SUMMARY 


In this chapter, consideration has been given to factors which have a 
bearing on the magnitude of the correlation coefficient. If any of these is 
operative in the case of a particular coefficient, it is the responsibility of the 
investigator to qualify his conclusions accordingly. Published reports of 
correlational studies should include the following. 

1. A definition of the population being sampled and a statement of the 
method used in drawing the sample. 

2. The size of the sample and an adequate treatment of sampling by 
means of nonantiquated formulas. 

3. The means and particularly the standard deviations of the variables 
being correlated, with some indication as to whether the sample is typical 
as regards heterogeneity with respect to the variables under consideration. 

4. The reliability coefficients for the measures and the method of deter- 
mining reliability. 

5. A statement relative to the homogeneity of the sample with respect to 
possibly relevant variables such as age, sex, race. 

6. A defense or precise interpretation of any reported correlations 
involving indexes or of any part-whole correlations. 

The researcher who is cognizant of the assumptions requisite for a given 
interpretation of a correlation coefficient and who is also fully aware of 
the many factors which may affect its magnitude will not regard the 
correlational technique as an easy road to scientific discovery. 


Chapter 11 
MULTIPLE CORRELATION 


n of correlation has been concerned chiefly with the 
prediction of one variable from another or the attributing of a portion of 
the variance of one variable to the action of a second variable. We shall 
next consider the case where it is desired to predict one variable by using 
several other variables as a team of predictors, or where, if causation can 
be assumed, an attempt is made to analyze the variance for one variable 
into components or parts attributable to the action of two or more other 
Variables. There is a close connection between the predicting and the 
analyzing problems; let us first consider the method of predicting one 
variable on the basis of other variables. 


So far our discussio 


RIABLE PROBLEM 
oblem of predicting X, from a knowledge 


of X, and Хз. The X, variable is frequently called the criterion, or depen- 
dent variable. If we had X, to be predicted from X; alone, we would have 
exactly the same situation as predicting Y from X. That is, the linear 


prediction equation (in gross score form) 


THE THREE-VA 


For simplicity, consider the рг 


ү'=ВХ + A 
becomes 
xX = BX, + А 
and the deviation form 
y =bxt+a 


becomes 
wi = bt, + а 


of the constants, B and A, or b and a, 


It will be recalled that the values 
169 
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were so determined as to give the maximum predictability, and that B and 
A turned out to be functions of the correlation coefficient between the two 
variables and of the means and standard deviations for the variables. The 
equation which resulted from giving A and B specific values was said to be 
the equation of the best-fitting line—the error of prediction was minimized. 
Now, if we wish to predict X; from X, and X, we start with an equation 

of the form 
X, = B, X; + ВХ + A (11.1) 


which can be written in deviation units as 
x = bot, + bx. + a 


Either of these forms represents the equation of a plane. It can be shown 
that В, = b, and Вз = bg. In fact, this is rather obvious when we consider 
the meaning of these B or b coefficients. They represent the slope of the 
plane; B; is the slope which the plane makes with the x, axis, and B the 
slope with regard to the z, axis. When we shift from raw to deviation 
scores, we are merely shifting the origin, or point of reference, to the inter- 
section of the means, and this point in terms of deviation scores becomes 
zero. This shift of the frame of reference does not change the position or 
angle of the plane; hence В, = b, and B, = by. (The student will recall 
that, for the ordinary two-variable problem, the slope of the line was equal 
to B or b.) 

It remains to attach meaning to A and a. In the equation Y' = BX + A, 
it was noted that the constant А was the Y intercept, i.e., the value of Y 
where the line cut the y axis. It was also found that a — 0; i.e., that in the 
deviation form the line cut the y axis at the origin. Perhaps the student 
has already anticipated, by analogy, that the A in our three-variable 
equation is the value of Y, where the plane cuts the z, axis, and that the 
value of a will become zero. 

Before going farther, it might be well to take a look at the problem 
geometrically. In the case of two variables, after plotting the X and Y 
values in a scattergram, we can readily picture the meaning of В and A, 
and also obtain some notion of why certain values of B and A will lead to 
better predictions than those obtained by other values. In the case of 
three variables, X}, X;, and X, we have a trio instead of a pair of measure- 
ments. In order to draw up a plot of N such sets of measurements, we will 
need to use a three-dimensional scheme. Instead of placing a tally mark in 
a cell defined by an interval along the = axis and one along the y axis, we 
now have to consider a cell as defined by intervals on the ху, the ть, and the 
хз axes. Instead of a square cell, we have a cubical cell. 
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Suppose an individual's three scores fall in intervals i4, i5, and i4; then 
his “tally” will be placed in the cubicle formed at the intersection of these 
three intervals. The total number of cubicles will be the product of the 
number of intervals on each axis, and an individual's location in the “Бох” 
will depend on all three of his scores. The student may be at a loss to know 
just how he could make such a three-dimensional scattergram. Actually, 
this diagram is not necessary, but it is of interest to imagine what such a 
three-way distribution would look like. If the correlations, ғу, гуз, and rpg, 
are fairly high (and positive), and if we think of the frequencies in the 
several cubicles as being represented by dots (or different degrees of 
density), then the swarm of dots will extend from the lower left front to the 
upper right back of the box. The greatest density will be at the center of 
this swarm, and the density or frequency will fall off іп all directions from 
the center. The swarm will have the general shape and appearance of a 
watermelon (ellipsoidal). | o. 

Imagine that a plane is to be cut through this swarm. Our job is to so 
locate the plane that, when we start upward vertically from any point on 
the bottom of the box, say the spot defined by any pair of values for ¥, and 
Ху, we will find that the altitude, i.e., the distance along the 2; axis at which 
the plane is reached, will constitute the best estimate of X, for individuals 
having any given X; and X scores. With a little reflection, the reader can 
see that, of many ways of placing the plane, some positions will obviously 
give very poor estimates, whereas others will lead to better estimates. 
What we need is that plane which for the given N sets of Xj, Xy, and Y, 
scores will yield the best possible estimates. . 

The criterion of "best" is a least square affair—the sum of the squares 
of the errors of estimate shall be a minimum. The task is really that of 
determining the values of A, By, and B; in formula (11.1) so that 


ў – X3* 


is a minimum. That is, we аге to assign to A, B», and B, those values which 

Will permit the best possible estimate of an unknown X; when we know the 

X, and X, values for the individual. The principle to be used is exactly 

the same as that employed to obtain the optimum value for B and A for 

the two-variable problem, but the present problem is more complicated 
ne the values for three constants. 


because we have to determi РКЕ . v. 
Derivation of regression equations. Our task is simplified if deviation 

Scores are used, and we assume a = 0 (if we carried a along, it would prove 

to be zero) itis simplified somewhat more if we transform all three sets 


of scores into standard score form, i.e., if we set 2 = (X — M)/S. Then 


Our equation becomes 
1 2, = Bato + Bats (11.2) 
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It should be noted that, since we are changing the size of our unit of 
measure, it cannot be argued that В, will equal B, or bj. The task now is to 
determine the value of the beta coefficients, В, and fs, so as to have the 
best possible estimate of z,, or so that the average of the Squared errors, or 


1 
ums 53 e 2 
N (а, 1) 


shall be a minimum. Since 2, — 2, = z, — faza — Взгз, the function, f, 
to be minimized is 


f= ize — bozz — Взгз)? 


The calculus is used to determine the values of Bs and f, which will make 
this function a minimum. We take the partial derivative of the function 
first with respect to В,, then with respect to f}. Thus, 


д —2Xz, 

an = E (а — Вз — B323) 
д —2X 

zc = (fe, ie) 


These two derivatives are to be set equal to zero and then solved simul- 


taneously for the two unknowns, Bg and В. Performing the indicated 
multiplications, summing, and dividing each equation by 2, we get 


—2z,2, + B, х2, 


Ezzy 
N N + Bs 


N 


=0 


Since we are dealing with standard Scores, we can now capitalize on 
certain properties thereof, namely, that the sum of their squares divided by 
N is unity, whereas any sum of cross products divided by N is the correla- 
tion between the two variables involved in the cross products. Thus, we 
have 


=r + By + Baro = 0 


=r + Bora + By = 0 
or 


Ps + гыз — rp = 0 
таб» + By — rg = 0 
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Since the rs in the equations are determinable for any given sample of data, 
they are in effect knowns, whereas the Bs are unknowns. We therefore have 
two simultaneous equations with two unknowns. These can readily be 
solved by a number of methods which the student will find in an algebra 


textbook. Straightforward solution gives 


Tio — lial23 
== 2 
1 — r's 

B Гуз — li2l23 

zz ёс UE) 
9 2 
1 — rs 


As soon as we have computed the rs, we can easily determine the fs. 
The obtained numerical values can then be substituted in the prediction 
equation 

у= Ве + ЁВз=з 
and z values we can predict the standard 
score on the criterion variable. However, in practice it is ordinarily more 
convenient to deal with raw scores; hence we need our prediction equation 
in raw score form. Obviously, if we replace the zs in the preceding equation 
by their values in terms of raw scores, means, and standard deviations, we 


so that for a given pair of ze 


will have T 
y= М, _ х„— М» Хз — Мз 
me a = ps cx A + Bs LA 
or 
Ü X. M. 
Eh My OX. og Ms + В, 23 — В 3 
Sy 5, E S2 5з Ss 


Multiplying by 5; and rearranging terms, we have 
y o i Sa + By SEX + (Sa Һәм) шз 
ае Bo ж 2 3 S; 3 Si 2 EA 


from which we see that our original Вз must equal BS1/S2), В, = B3(S4/S3), 
and A = the parentheses term. Thus we can readily determine the 
numerical values of By, Вз, and A and thereby have the constants for the 
Prediction equation. Actually, the values of B, and Вз are the optimum 
weights to be assigned to X, and X; in order to predict X, 

Error of estimate. The accuracy of the prediction of X, by the best 
combination of X; and Хз can be ascertained by examining the error term, 
le, Y, — Х'уог 51(21 — 2'1). The sum of the squares for the errors divided 
by N will yield the variance of the errors. The square root would 
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correspond to the standard error of estimate. L 
units), then 


Xa – 21)? 
21.23 = М 
= UG — Byz, — Baza 
N 


et S, be this error (іп z 


5° 


_ х2, 2 Xv, 2 Xx NS 2832123 , 28,8323 
BE B^ wt Bs N N o NW 
=1+8%+ В — Bary, — 2Bsrig + 2В›Взг»з 


Which by algebraic manipulation reduces to 


Sas = 1 — (Вт + Barsa) (11.4) 
in terms of standard scores. Then S? 
Variance for raw scores. 
Multiple r. We next define the multiple correlation coefficient as 
the correlation between 2; and the best estimate of % from a knowledge of 
2, and z,. In symbols, 


; times this would give the error 


z3 2x (Boz + Baza) (11.5) 
NS, 


Eat 


Note that, although S., = 1, it does not follow that S, =1 


= 1. In order to 
evaluate this last S, we write 


=z + 21.93 
That is, we think of 2; as bein 


estimate plus a residual. It c. 
independent of each other; 


8 made up of two parts- that Which we can 
an easily be shown that these two parts are 
hence by the variance theorem we have 


E x 
5 гави S. * $n 
or 
th = S. + S s 
en 
2 == 
S =le Sas 


But SAG nothing more than the variance of the prediction errors аз 
given by (11.4); therefore 


S. = МВ + бэл 
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Then, by substituting in formula (11.5), we have 
E Угу(әл» + P323) 
түз = L9 ———À—— 
NA Barta + zrs 
єз faXmm + Bsbzr%3 _ Ber + Dari 
N I faris + faris A Bario + faris 
= V Beri + Paris (11.6) 
It can also be shown that 


riz = 


We thus see that, as soon as the fs are determined, we can write the 
regression equation for predicting 21 from z, and 2; and can also specify the 
degree of correlation and calculate the error of estimate. This error 


obviously can be written from formulas (11.4) and (11.6) as 
Sys = SV 1 — 193 (11.7) 


which is in terms of raw scores. 
Formula (11.7) has been used frequently to define the multiple correla- 


tion coefficient. Stated explicitly, 


2 


ray = 1 — P 


S29 =1 
S* 
Then, by substituting from (11.4), we again arrive at (11.6). 

The student will note the similarity of formula (11.7) to the ordinary 
error of estimate for the bivariate situation. Thus the multiple correlation 
coefficient can be interpreted, in terms of reduction in the error of estimate, 
in exactly the same manner as the ordinary bivariate correlation coefficient. 
The only difference is that we are now determining the regression coeffi- 
cients, or weights for two variables as a team, so as to get the best possible 
prediction of a third variable, whereas in the bivariate situation only one 
regression coefficient is necessary. A multiple correlation coefficient of .60 
has, aside from minor qualifications to be discussed later, the same meaning 
in a predictive sense as an ordinary correlation of .60. Furthermore, the 
interpretation in terms of contribution to variance also holds for the 
multiple correlation coefficient; i.e., if causation can be assumed, it may 
be said that a multiple r of .60 indicates that 36 per cent of the variance in 
the criterion or dependent variable can be attributed to variation in the 
two independent variables. 

Relative weights. The question arises as to the relative importance 
of the two variables as contributors to variation in the criterion variable. 


21.93 


and the other in pounds); hence the 
numerically twice Вз, it does not follo 
In order to get around this difficulty 


Scores; these will b 


y are not comparable at all. If В, is 
w that X, is twice as important as Хз. 
» We must think in terms of standard 


on will be comparable 
Since 
2 _ о сә 2 

S 2 5 za +S 21-23 
ог 
апа = S = Ss 

l= Se s = 1.5 

it follows that d 

эз = Sa, 
That is, r 1-23, Which ci 


y making N predictions of 21 
lues of г, and 2з and then computing the S for the 
distribution of these predi 


‚ 
#1 = Boz + Baza 

we can indicate the value of S? , as 
ЫЫ 


° EG _ UByzy + Baza) 
S 2) => — == ee 
= N N 


ES 


- 2+ В, + 2B oP Dz 25 
Е N 


$52. = В + Въ + 28583 


Which becomes 
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In summary, it can be said that the fundamental problem in multiple 
correlation is that of obtaining the optimum weighting to be assigned to 
independent variables (X; and Хз) in predicting or explaining variation in a 
dependent variable, Х,. That is, we determine the value of B», B4, and A 


in the equation 
X', = ByX, + ВХ + A 


SO as to get the best possible estimate of Х,. This is resolved by working 
with the prediction equation in standard score form with f coefficients. 
inable from the intercorrelations among the 


The value of each f is determi j 
three variables. Once the fs are calculated, we can: (1) readily compute 
the prediction equation; 


the B coefficients needed in the raw score form of 
(2) determine the value of the multiple correlation coefficient and the error 


of estimate; (3) ascertain the relative importance of the independent 
variables as predictors or, if causation can be assumed, as contributors to 
the variance of the dependent or criterion variable. It is important to note 
that the multiple correlation coefficient represents the maximum correlation 
to be expected between the dependent variable and a linearly additive 


combination of X, and Хз. 


MORE THAN THREE VARIABLES 


a dependent variable and four independent 
d as predictors or which might be thought of 
dent variable. The cause and effect, as 
bles is a logical problem 


Suppose that we have 
variables which might be use 
as causes of variation in the depen { 
Opposed to concomitant, relationship among variables 
Which must be faced by the investigator as a logician rather than as a 


Statistician. Whether we resort to the multiple correlation technique as an 
aid in predicting or as an aid in analysis will depend entirely on the problem 
being attacked; the mechanical solution is the same, but the investigator 


must choose the interpretation which best suits his purpose. | 
For а five-variable problem, we need the constants in the regression or 


prediction equation, 
X'i = В,Х + ByXs + B,X, + ВХ + А 
Which can be written in standard score form as 
21 = Bote + Bats + Ваа + В525 


As in the three-variable situation, the problem is that of determining the 
optimum values of the Bs or the fis so as to get the best possible prediction 
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of X, or 21, i.e., so that 

XX, Ху) 

Eg ow 


or 


shall be as small as 
of the standard sc 
function 


possible. The mathematical solution is easier by way 
ore form of the regression equation. We have the 


fes Xa — 21)? = E(a- Baz, — Bazs — Baza — Bsz? 
N 


(11.9) 
N 


Which is to be minimized by assigning proper values to the fs. These 
values are obtained by taking the derivative of tlie function with respect to, 


and in order for, each of the Bs. This will yield four derivatives which 
when set equal to zero will gi 


unknown fs. These equati 


equations in order to determine the values of the fs. The obtained Bs will 
be such that the sum of the Squares of 2, — z^ will be the least possible; i.e., 


we will have the best possible estimate of z, from an additive combination 
of the four independent vai tables, 


The student of the calculus can read 


obtained by taking derivatives of form 
form (when set equal to zero): 


ily verify that the four equations 
ula (11.9) will take the following 


Ba + Bares + Barea + Вг» — гә = 0 
Bates + Ba + Parsa + Boras — ты = 0 (11.10) 
Bara, + Para, + В. + Вуга — та = 0 
Boras + Baras + Baras + Bs — rs = 0 
These equations result from 
three-variable Problem. The four 


ation to include any number of variables 
utilized here for the three- and the five- 
es, formula (11.2) becomes 


principles as 
Variable problem. For n variabl 


Z1 = Bute + Bs +++ + ue, аы) 
The extension of 


(11.3) as the Bross score equation should be obvious. 
Formula (1 1.6) fo 


r the multiple correlation coefficient becomes 


Таза = V Batis + args А Burg (11.12) 
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To solve for the unknown fs, the student may resort to any of the 
schemes given in algebra textbooks for solving simultaneous equations. 
One method is by way of determinants and Cramer'srule. The coefficients 
of the unknowns are the intercorrelations among the four independent 
variables, whereas the constants in these equations are the respective 
correlations of the dependent with the independent variables. In the 
application of Cramer's rule, these constants are thought of as being on the 
right-hand side of the equation, i.e., shifted to the right of the equality 
mark, with the consequent change ofsign. The student should keep in 
mind, however, the fact that the original sign of any of the computed 
correlation coefficients must be considered. 

Solution by Cramer's rule becomes quite tedious and burdensome for a 
problem involving more than four or five variables. Indeed, this deter- 
minantal solution is practically impossible for problems involving a large 
number of variables. Fortunately, there is available a simplified solution, 
but before turning to it, we would like to indicate some algebraic manip- 
ulations in terms of determinants. 

It will be noted from the foregoin 
intercorrelations among the five varia 
can be conveniently arranged in a table, or 
we can define a major determinant as 


g simultaneous equations that all the 
bles are involved. These correlations 
in determinantal form. Thus 


1 Tj Ms Гы Das 
Ty 1 Тоз Ta "z 
D = | тз les 1 r34 T35 
Ta Tea Га 1 № 
ris Tes Tss Das 1 
ow and first column, the minor which 
rrelations among the four independent 
ently symbolized as Dg; ie. we 
hich involve the subscript 1. If 
he subscript 1 and the column 
uld symbolize the resulting 


If we were to delete the first r 
remains would involve the interco 
Variables. This minor might be conveni 
have deleted the column and the row W 
We were to delete the row which involves t 
involving the subscript 2 throughout, we wo 
minor as Оу». 

Now it can be shown that 


or any f, say В,, will be 
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where the quantity (— 1)? is an indicator of either a positive or a negative 
sign, but the ultimate sign of В, is also dependent on whether the numerical 
values of the determinants are Positive or negative. It can also be shown 


that the multiple correlation Coefficient can be written as a function of 
determinants, thus 


Correlation in terms of d i 
method.* 


NUMERICAL SOLUTION 


Table 11.1. Schema for arranging rs for Doo! 


little solution 
X, Xs X, X; x 


"әз "әд "5 lp 

"за Газ Pis 

1 "s тщ 

1 —ns 
ze жы ыи 


£ Kelley, T. L., Statistical method, New York: Macmillan, 1924. 
T Paterson, D, G., et al., Minnesota mechanical ability tests, Minneapolis: 


University 
of Minnesota Press, 1930. 


[11] MULTIPLE CORRELATION 181 


Let 
X, = Criterion (mechanical performance-quality). 
X, = Minnesota assembling test. 
X, = Minnesota spatial relations test. 
X, = Paper form board. 
X, = Interest analysis blank. 
Since the several means and standard deviations will be needed, these are 


recorded in Table 11.2. 


Table 11.2. Means and Ss (Minnesota data) 

Xx, X, X3 X, X; 
зз QN ME 
M 1494 127.56 1422.90 46.60 107.00 
S 2.09 25.32 296.39 19.45 18.00 


tle solution for the В coefficients. Once these 


are known, the regression equation in raw score form can be written, and 
the multiple r and the error of estimate can be determined. The table 
includes an indication of the calculation of these values. The student will 
have to study the schema of the Doolittle solution carefully in order to 
Brasp the necessary steps. We shall not attempt a complete exposition of 
the steps since the procedure of each step is indicated in the left-hand side 
of the table. A few remarks, however, will be of aid to the student. 

As already specified, the correlations are written down in an order 
Corresponding to equations (11.10) except that values to the left and below 
the diagonal are omitted. The first thing we do is to set up a check column. 
The first entry, 1.92, is obtained by summing, algebraically, the first row of 
Correlations (including the diagonal 1.00); the second figure, 2.12, is the 
sum of the second row plus .56; the third entry, 1.99, is the sum of the 
third row plus .49 and .63; and the 1.63 is the sum of the fourth row plus 
42, .46, and .39. The rule being followed should now be obvious: the jth 
entry in the check column is obtained by summing the 1.00 in the jth row 
With the values above it and to its right. The student should satisfy himself 
that this is equivalent to summing the correlations for the respective 
equations in (11.10). Since the check column will provide, at intervals, an 
automatic check on our computations, this summing should be done at 
least twice to insure accuracy. 

Line 1 of the solution is obtained by copying down line a, the first row 
of rs; and line 2 consists of the line 1 values with the signs changed. The 
Second part of the solution begins with line 3, which is obtained by 
Copying down the b row of correlations. Line 4 is obtained by multi- 
Plying entries in line 1 by —.56, which figure is found in line 2 directly 


Table 11.3 gives the Doolit 
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Table 11.3. Computation of multiple r 


Es Xs X, Xs x, ck 

(a) 1.00 56 49 42 —.55 1.92 

(b) 1.00 .63 46  —.53 2.12 

(c) 1.00 39  —.52 1.99 
(d) 1.00  —.64 1.63 
(1): line (a) 1.00 .56 49 42 —.55 1.92 

(2) —100 —56  —49  .45 55  —1.92 
(3): line (5) 1.000 63 46 —.53 2.12 
(4): (1)(—.56) —314 —274 L235 308 1075 
(5): (3) + (4) .686 .356 225 .—:222 1.045 ck 
(6: (5)(—1/.686) —1000 —.519 38 324 —1524ck 
(7): line (c) 1.000 39  —.52 1.99 
(8): (1)(—.49) —240 —.206 270 —.941 
(9): (5)(—.519) —.185 edi 115  —.542 
(10): (7) + (8) + (9) 515 067 —.135 .507 ck 
(11): (10)( —1/.575) 


1.000 —.64 1.63 
(13): (1)(—.42) —176 231  —.806 
(14): ($)—.328) —:074 ўз 343 
(15): (10\(—.116) —008 016  —.059 
(16): (12) + (13) + (14) + (15) -42 —320 .422 ck 
(17): 0671/42) —1000 „431  —.569ck 
Back solution 
From (17) 431 = ff, 
From (11) (431) —.116) + 235 = В, = .185 
From (6) (185) —.519) + (.431)(—.328) + 324 — Bs = .087 
From (2) (087)—.56) + (185)(—.49) + (.431)(—.42) + 55 = By = .230 
Final checks 
(230Y1.00) + (.087)( .56) + (.185)( 49) + (.431)( 42) — 55 = 000 
(.230)( .56) + (.087)(1.00) + (.185)( 63) + (.431)( .46) — 53 = 001 
(-230)( .49) + (087) .63) + (-185)(1.00) + (.431)( .39) — 52 = 001 
(.230)( .42) + (.087)( 46) + (185) .39) + (.431)(1.00) — 64 = 000 
From formula (11.3) 
2.09 2.09 
B, = O T = .0190, B, = (.087) 59639 = -0006, 
2.09 2.09 
B, = (135; = 
а = (.185) 19.45 7.0199, В, = (431) 18.00 7.0500, A = 5.40 
Th 
na ^ = .0190x, + 0006, + .0199.X, + .0500Х; + 5.40 
зы: = (.230)(.55) + (087)(.53) + (.185)(.52) + (.431)(.64) = .54465 
T1345 = .738, 51.2345 = 2.09 V1 — (.738)2 = 1.40 
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above the 1.000 of line 3. As indicated at the left, line 5 results from sum- 
ming lines 3 and 4, i.e., 1.000 + (—.314) equals .686, etc. 

At this point we have our first automatic check: summing line 5 across 
should yield 1.045, already obtained by vertical summing of values in the 
check column. To be a satisfactory check, these two sums should agree 
within limits consistent with errors imposed by rounding off to three 
decimal places. Acceptable discrepancies will be of the order +.001, 
£.002, - - - .005, seldom larger. | 

Line 6 is obtained by multiplying line 5 by the negative reciprocal of its 
first entry. The correctness of the reciprocal used is evidenced by the fact 
that, when multiplied by .686, unity results. The ck attached to — 1.524 
indicates that summing the entries in line 6 yields the same value as 1.045 
multiplied by the negative reciprocal of .686, thus providing a further 
check. This completes the second part of the solution. | 

The third part begins with a copying of row c of the correlation table. 
The student should now be able to follow the steps; in particular, he 
Should note that a multiplier is secured from the last line of each preceding 
part of the solution; that each multiplier is applied in turn to the values in 
the line just above it; that, when all such multipliers have been utilized, 
the lines are summed (summing across again provides a check), and the 
resulting line is, as before, multiplied by the negative reciprocal of its first 
entry, thus completing the third part of the solution. | 

The fourth part involves similar operations. If we had five independent 
Variables, we would proceed in like fashion, with an additional or fifth 
part. The schema can be extended to any number of variables.? There 
will be as many parts to the solution as there are independent variables. 
The last part always consists of three columns of figures, and the bottom 
figure in the middle column is the value for f,. In our example В, = 
Bs = .431. | | 

The other fis are determined by а “back” solution, which always involves 
a substitution of the value or values already found into the last line of the 
various parts (lines 11, 6, and 2 in our illustration). This back solution is 
given in Table 11.3. Asa final check on all the computations, the four fs 
obtained must be substituted into the four simultancous equations with 
which we began. This check appears next in Table 11.3. 

In order to put our results into useful form, we ordinarily require the 
multiple regression equation in raw score form, and for this we need the B 
Coefficients and A as called for in formula (11.3) extended for more 
Variables. To get the multiple correlation coefficient, the fs and appro- 
priate rs are substituted in formula (11.12), and from (11.7) we obtain the 


For more than five or six variables, the computations are more economically 


accomplished by electronic computers. 
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standard error арр 
by the calculated re 
values. 


If the problem involves analysis rather than prediction, there is no need 
to set up the regression e 


quation or calculate the error of estimate. 
Appropriate interpretations would depend on the Bs and кызыны: 


ropriate for judging the accuracy of predictions made 
gression equation. Table 11.3 includes these additional 


SAMPLING ERRORS 


The classical formula for th 


€ standard error of a multiple correlation 
involving n variables is 


Ss ees енер (11.13) 
T JN 


N = 3, it would be 
In general, if n = 
be greater than n 


E = JL a C= EE (11.14) 


N—n 
This is sometimes knownasa Correction for shrinkage, since it has been 
Observed that in general 


the correlation between observed and predicted 
values for a new Sample tends to be less than the multiple r obtained by 
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means of the fs computed from the original sample. Obviously, if N is very 
large, say 500, and n small, say 10, the amount of bias or expected shrink- 


age is so small as to be negligible. 


CAUTIONS AND REMARKS 


As already indicated, there are two principal uses for the multiple 
correlation technique: (1) it yields the optimum weighting for combining 
a series of variables in predicting a criterion and provides an indication of 
the accuracy of subsequent predictions; (2) it permits the analyzing of 
variation into component parts. There are certain more or less obvious 
pits into which the unwary user of the multiple regression and correlation 
method may fall. For example, it is possible to write a multiple regression 
equation for predicting school achievement (X;) from a knowledge of age 
(X) and mental age (Хз). In standard score form it might be 2 = .2725 
+ .672,, from which it might be inferred that school achievement depends 
оп age to a certain extent but on mental age to a greater extent. However, 
it is entirely possible to argue that mental age depends partly on school 
achievement. One could also use the same data to write the regression for 
age on mental age and school achievement; thus 2’, = 56% + .0623, from 
which the unwary might conclude that age depends on school achievement 
and mental age. 

Multiple correlation may be particularly deceptive when we have 
available several variables, each of which yields a rather low correlation 
With the criterion and from which those yielding the higher correlations 
with the criterion are selected for the prediction equation. Such selecting 
tends to capitalize on correlations which might be high because of sampling 
fluctuations. For example, the author was once requested to compute the 
multiple ғ for an 11-variable problem. None of the 10 variables showed 
a very high correlation with the criterion, the highest being .27. The result- 
ing multiple was .44, which was statistically significant for the sample of 
89 cases. When it was learned that 10 variables out of 40 had been selected 
ES the most promising, i.e., because they showed the highest correlations 
With the criterion, the real significance of the multiple r of .44 was 
questioned. That it really was misleading was clearly evidenced by the 
fact that for a second and similar sample the variable originally yielding 
the highest r (.27) now produced an r of —.11. That is, the supposedly 
best Single predictor was actually of very doubtful value, and this, coupled 
with a tendency for the next highest rs to drop appreciably, meant that 
predictions by the regression equation could not be as good as was inferred 
from the multiple of .44. 

Nothing has been said as yet concerning the principal assumption and 
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consequent limitation in the use of multiple regression equations, namely, 
that regressions for the first-order correlations must belinear. There are 
methods for handling multiple correlation for 
reader is referred to M. Ezekiel" 

It is not obvious from our dis 
multiple correlation which resu 
five or six is very small. This 
not, of course, operate if we 
correlated much more highly 
utilized. 


curvilinear regressions. The 
s Methods of correlation analysis.§ 

cussion that, in general, the increase in the 
Its from adding variables beyond the first 
phenomenon of diminishing returns would 
were to find an additional variable which 
with the criterion than any of those already 


Another fact which may not be apparent to the reader is that we can 
expect the multiple r to be higher when the i 


Predictors are low instead of high. This point 
by computing the multiples for, say, 
for ra. 

An interesting paradox of multi 
fact mentioned in the previous ра 


ntercorrelations among the 
can be easily demonstrated 
Tj» = .50, г = .50, and varying values 


ple correlation and an exception to the 
Tagraph is that it is possible to increase 
prediction by utilizing a variable which shows no, or low, correlation with 
the criterion, provided it correlates well with a variable which does correlate 


With the criterion. Thus, if пэ = 400, ry, = .000, and rą = .707, the 


regression equation will be z’, = :8002, — .56625, and гүз will equal .566. 


It is thus seen that, when 7; is combined with 2, an appreciable gain in 
prediction occurs even though when taken alone 23 is worthless as а 
predictor of z,. 

Such a variable has been termed a “suppressant.” We do not quickly 


See just how a suppressant variable, showing no correlation with the 
criterion, can increase the 


10 elements, Y, of 10, X, o 


in common, X; and X, have 5 elements in common, and X; and X, have no 


overlapping elements. Diagrammatically, the variables and elements 
would be 
X 2 
aaaaaabbbbcddddd 
ыва ава ES 


» "з = .000, ra = .707. These lead to z', = .8002; 
— .5662,, and гүз, = .566, Variable Y, has a negative regression weight, 
i.e., by the use of Хз something is being subtracted or suppressed. As set 
up here for illustrative purposes, all the elements of Хз are contained in Xas 


8 Ezekiel, M., Methods of correlation analysis, New York: John Wiley and Sons, 1959. 
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these elements are not related to X, and hence their presence in X, must 
tend to lower the correlation between X, and X»; if these elements could be 
suppressed, the correlation between X; and X; minus the irrelevant (so far 
as X, is concerned) elements of X, should be higher than ry. Actually, if 
we think of the “d” elements of the diagram as being nonexistent, we would 
have variation in X; dependent on only 5 elements, 4 of which overlap with 
X,. The correlation between X, and the abridged X, would be 4/% 10(5) 
Or .566, which has exactly the same value as the multiple r obtained pre- 
viously. This exact correspondence to 7.55 will be obtained only when 
all the Y, elements are contained in Xs. If X, contains other elements, its 
Use as a suppressant will aid in predicting X;, but the resulting ry... will not 
correspond to an r deducible from the common element formula. The 
reason for this is left as an exercise. 

The student, by resort to the notion of common elements, may secure 
a better understanding of the proposition that a higher multiple is obtain- 
able when the correlations with the criterion are high and the correlations 
between the predictors low or zero. The reader should be warned, however, 
that such a condition is hard to realize in practice, as is also the finding of 


Variables which will qualify as suppressants. 


NOTE ON NOTATION 


The symbol гуз has been used to represent the correlation (multiple) 
between X, and the best combination of X, and Хз. This should not be 
Confused with л, о.з, which indicates the correlation (partial) between X, and 
X» with the effect of Хз ruled out or held constant. The symbol S.» it will 
be recalled, stood for the standard error of estimate of Yas estimated from 
X; S,, would be the error of X, when estimated from А; and 51. would 
be the standard error of estimate of X, when estimated from X, and X, by 
means of the multiple regression equation. 

In the foregoing discussion, fẹ has been used as the symbol for the 
Tegression weight of Yj. A more formal, albeit cumbersome, notation 
Would be бз, which would be read as the regression of X; оп Хь, i.e., 
the Coefficient for А», when used in combination with Хз, Х„ and X,. It 
15 not an accident that the subscript pattern resembles that for the partial 
Correlation coefficient. If we were dealing with a three-variable problem, 
Bs could be written as б. This notation really means that we have the 
Net regression of X, on X, when Хз is held constant. Hence the coefficients 
are sometimes spoken of as partial regression coefficients. As a matter 
Of fact, these partial or multiple regression coefficients can be computed by 
Nay of partial correlation coefficients, but the method is not nearly so 
Straightforward and self-checking as the Doolittle procedure. 


Chapter 12 


OTHER CORRELATION 
METHODS 


The product moment correlation me 
two variables are graduated, is restricte 
regression, and needs careful qualifyi 
Skewed distributions. There are, ther 
inappropriate. In general, the majori 
practice can be handled by some typi 

There are no general rules to fol 
skewed distributions. Frequently, 
of such a variable and there 
mately normal; 


asure is applicable only when the 
d by the assumption of linearity of 
ng if either or both variables yield 
efore, many problems for which af 18 
ty of the situations which are met in 
€ of correlational technique. 

low in the case of variables yielding 


six headings: (1) graduated measures for one variable, dichotomized or 
two-category information. for the seco 


nd variable; (2) both variables 
dichotomized; (3) three or more Categories for one variable and two or 
more for the second; ( 4) three or more categories for one variable and a 
graduated series of me 


asures for the other; (5) both variables graduated, 
with curvilinear relationship; (6) when data are rank-orders. 


An estimate of the degree of correlation for each of the foregoing 
situations can be obtained Provided certain assumptions concerning the 
188 
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variables can be regarded as tenable. Ordinarily the graduated variable 
can be thought of either as being continuous or as progressing in a sufficient 
number of discrete steps so as to give the appearance of continuity. The 
approach to normality for such series can, obviously, be specified. The 
nature of the categorized variable, whether discrete or continuous, can 
ordinarily be ascertained on logical grounds, but the question of whether a 
continuous variable for which we have only a distribution by categories 
would yield a normal distribution if we had some measuring stick for the 


trait is not easy to answer. 


BISERIAL CORRELATION 


When one variable is measured in a graduated fashion and the other is in 
the form of a dichotomy, we have the so-called biserial situation, for which 
there are two measures of correlation: biserial r and point biserial r. The 
difference between these two measures depends essentially on the type of 
assumption which is made concerning the nature eee 
Variable. 

The most typical example of situations calling for one or the other of 
these measures is to be found in the test (mental and personality) field: 
the correlation between an item scored as pass Or fail (yes or no, like or 
dislike, etc.) and a graduated criterion variable (or a total score on all of a 
Set of items). We need to know each individual's score on the graduated 
Variable and the dichotomy to which he belongs. Then we can make a 
distribution or scattergram with from 12 to 20 intervals for the graduated 
Variable along the y axis, and with two intervals for the twocategories along 
the x axis. Such a correlation scattergram is given in Table 12.1, which 
involves pass-fail on “abstract words" vs. composite IQ on Forms L and M 
of the 1937 Stanford-Binet. It is obvious that there is a tendency for those 
Who fail the item to have lower IQs than those who pass—performance on 
the item is related to IQ. 

Biserial coefficient, r, If it can be assumed that underlying the 
dichotomy there is a continuous variable, we can obtain a measure of 
Correlation which is an estimate of what the product moment correlation 
Would be in case the dichotomous variable were measured in such a way 
as to produce a normal distribution. This estimate is given by 


TS (Mz — MiXpip (12.1) 
28, 
Or by the exact equivalent 
(M; — Мәр (122) 


тыз 
28, 
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Table 12.1. Biserial table for **abstract words" as X and Binet IQ as Y 


Item 
EUR 
IQ Fail Pass Totals 
(1) (2) 

145-149 1 1 
140-145 S, — 17.69 
135-139 1 1 p= 37 
130-134 3 3 Рә = 63 
125-129 4 4 z= 3% 
120-124 6 6 
115-119 10 10 as (109.86 — 84.43)(.37)(.63) 
110-114 7 1 d (.378)(17.69) 
105-109 1 8 9 = .89 
100-104 1 5 6 

95-99 4 9 13 Or by formula (12.2): 

90-94 7 6 13 

85-89 9 2 11 p, _ (109.86 — 100.45)(.63) 
80-84 3 1 4 (.378)(17.69) 

75-19 4 4 

70-74 5 5 = .89 PE 
65-69 . . (109.86 — 100.45) [.63 
60-64 3 3 "ъ= 17.69 37 
Totals 37 63 100 = .69 


Means 84.43 109.86 100.45 


in which 


Ру = proportion of cases in the first category. 

P2 = proportion of cases in the second category. 
= mean of Ys for cases in the first category. 

= mean of Ys for cases in the second category. 

M, = mean of all the Y scores, 

S, — S of all the Y scores. 

= ordinate for the unit normal curve at th 


cases are cut off; itis determined by entering р, or p», whichever is 
smaller, as a q value in Table A, then reading off the adjacent 
ordinate value in the fourth column of the table (interpolating if 
necessary). 
Formula (12.2) is the more convenient whi 
be correlated against the same 
illustrated in Table 12.1. 


€ point where р, (or рз) 


en each of a series of items is to 
graduated variable. The computations are 


In the derivation of r, it is assumed not only that a normal distribution 
underlies the dichotomy but also that the regressions would be linear if the 
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dichotomized variable were measured. The latter assumption cannot be 
checked; it is apt to hold for ability variables but may be violated for 
personality traits. The former assumption has troubled many. Actually, 
the main issue is the question of continuity. Consider the pass-fail dicho- 
tomy; it is obvious that failing a test item represents anything from a 
dismal failure up to a near pass, whereas passing the item involves barely 
passing up to passing with the greatest of ease. Such a line of reasoning is 
certainly presumptive evidence for continuity, and a similar argument can 
be advanced as regards yes-no, like-dislike, and similar categories. Given 
a continuous trait, it is usually (if not always) possible to construct a test 
thereof which yields a normal distribution, and consequently we need not 
worry about the mathematical assumption of normality when using rj. We 
can justify the use of r, with obviously continuous variables by saying, as 
pointed out earlier, that the obtained coefficient represents what we would 
expect the product moment correlation to be if we had a measuring scale, 
for the dichotomized trait, which yielded a normal distribution. 
The sampling error of biserial r is given approximately by 


КЛ 


_— т, 
S, -——— 12.3 

m (12.3) 
As an exercise, the student should compare the magnitude of the sampling 
error of biserial r for various cuts (p values) with that of the product 
moment r as given by the analogous classical form, 5, = (1 — rv N. 
It might be anticipated that the sampling error will be large when dichoto- 
mies are extreme, i.e., involve cuts yielding very high (and low) ps. Thus, 
if N = 100, and we have a .95-.05 cut, it follows that one of the means 
used in computing r, by formula (12.1) will be based on only five cases and 
therefore will be subject to rather large sampling fluctuation, which 
incidentally will not be counterbalanced entirely by the relatively greater 
Stability of the other mean. It may occur to the reader that the use of 
formula (12.2) would overcome this difficulty, since it is always possible to 
arrange to use the mean for the category having the larger number of 
Cases, thereby avoiding the unstable mean. This appears plausible enough; 
its refutation is left to the student. 

The fact that the sampling error for biserial r is large when extreme 
dichotomies are involved should serve as a warning. Unless М is fairly 
large, we should not place much confidence in a biserial r based on cuts 
more extreme than .10 (or .90). 

Since no r to z transformation is available for use with biserial r, the 
difficulty of skewed sampling distributions for high r,s cannot be overcome. 
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In testing the null h 
formula (12.3) may 
significance of r, is 
M, and M,. 
Although r, is an estimate of 
as to its interpretation. It is, 
ship between two variables, 
formulas, nor does it lead to 


pothesis (that no correlation exists), the r term in 
be dropped. For N small, a more adequate test of the 
Possible by way of the г test for the difference between 


estimate would not equal SV 1 — ,2. 
If we have a Y score to use in 


estimate on the basis of the ten 


’s X position is in the first category; i.e. 
iction would be Correct. But such a pers 


ichotomous trait is truly discrete, an appro- 
priate measure of correlation is given by 


гь = (Ms = Pio (12.4) 
or its equivalent т 
= Ms — M, P2 


>= 2 (12.5) 
ý S, Pi 


Actually, r,, is the product moment correlation between Y and the X 
categories scored as either 0 or 1 (scoring as 1 and 2, or as 4 and 10, or any 
other two values will yield the same correlation). The value of Ty, for the 
data of Table 12.1 is -69, compared to an r, of .89. The magnitude of rp, 
tends to be less than that of r, for the same set of data, as can be seen by 
examining the following connection between the two coefficients: 


r 


For a 50-50 dichotomy, z = .39g 


9 and ть = .798r,, and as the dichotomy 
departs farther and farther from 


50-50 the discrepancy between rj, andr, 
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increases. For a 10-90 cut we have г, = .585r,. The maximum degree of 
correlation between a dichotomous variable and a normally distributed 
variable will occur when there is no overlap between the Y distributions 
for the two categories. For such a situation r, will be either +1.00 or 
— 1.00 regardless of the cut, whereas r,, will be 4-.798 for a 50-50 cut and 
only +.585 for a 10-90 cut. These two coefficients are not on the same 
scale; they will agree only when there is exactly no relationship between 
the two variables. Even if the dichotomous variable were a genuine point 
variable, r,, as an expression of the degree of relationship would not be 
comparable either to r, or to the product moment r between two variables 
measured in a graduated fashion. 

Despite the fact that true point variables are practically nonexistent in 
Psychology and despite the difficulties of interpreting r,, as a terminal 
descriptive statistic, r,, has a rightful place in certain analytical and practi- 
cal work where the two categories are arbitrarily, for convenience, assigned 
Point scoring values of, say, 0 and 1. For example, if a dichotomized vari- 
able with point scoring were included in an л variable multiple regression 
€quation, point biserial rs would be the correct values for the correlation 
of the dichotomized variable with the remaining n — 1 variables. 

For the large sample situation the significance of r,, (as a deviation from 
Zero) may be tested by using o, = ПУЛ as its standard error. For small 
samples, the test for the difference, M, — M,, is appropriate. 

A troublesome difficulty with the biserial coefficient, r, is that it 
Occasionally exceeds unity. The usually given explanation for this is that 
the assumption of normality for the dichotomous variable is not tenable, 
but it seems more likely that when such rs occur it is because the graduated 
variable, for the combined categories, is either platykurtic or bimodal in 


distribution. 


TETRACHORIC CORRELATION 


When both variables yield only dichotomized information, as, for 
example, two items scored as passed or failed, it is possible to secure an 
estimate of what the correlation would be if the underlying traits were 
continuous and normally distributed or if they were so measured as to give 
normal distributions. The measure of relationship for such a situation 
is known as the tetrachoric correlation coefficient, usually designated as Pb. 
It is not feasible to derive here the formula for tetrachoric correlation, but 
perhaps a few words will help us understand the reasoning back of the 
formula. 

Let us suppose that we have before us a scattergram for the correlation 
between height and weight; let us further assume that this scatter exhibits 
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all the characteristics of a normal correlational surface as defined by 
equation (9.16). That is, the two marginal distributions and all the vertical 
and horizontal array distributions are normal; the regressions are linear; 
and the arrays homoscedastic. For such a normal plot, it is possible, 
knowing the degree of correlation and the Ms and Ss of the two variables, 
to specify how many or what proportion of the cases will fall in any given 
segment of the scatter plot. This can be done by mathematical manipula- 
tion of formula (9.16) or by the aid of Table VIII of Pearson's Tables for 
statisticians and biometricians, part П.* 

Now, of course, if the student had placed before him a scatter for height 
vs. weight and were asked how many cases fell in that portion of the table 


Table 12.2. Correlation for height and weight dichotomized 


Below Above 
1201b. 1201Ь. 


Above 68 in. 10 80 


Below 68 in. 60 50 110 


70 130 200 


below 120 pounds and also below 68 inches, he would simply count them. 
But suppose he were told that, when the two axes were cut at 120 pounds 
and 68 inches, the frequencies in each of the four quadrants so formed were 
as shown in Table 12.2. The purpose of tetrachoric correlation is to 
ascertain the degree of correlation which would permit the observed 
frequencies in such a fourfold table. A more rigorous statement would be: 
Given the four frequencies, what should be the true correlation—for the 
Scatter underlying the fourfold table—in order to make the obtained four 
frequencies most likely? 

In order to secure this estimate it is necessary to convert into a propor- 
tion each of the four frequencies and each of the marginal totals by dividing 
by N. For the fourfold table we may symbolize the frequencies as in 
Table 12.3, the proportions as in Table 12.4. Then, the tetrachoric 
coefficient can be obtained from the following rather forbidding equation: 

, 2 
6M. аут рар 
2.2, 2 6 


A 
+ (9 — 32) — 3) +55 26 


in which it is assumed that both qand q' are less than .50. The general rule 
* Pearson, Karl, Tables for statisticia 


ns and biometricians, part IT, Cambridge: 
Cambridge University Press, 1931. 
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is to choose whichever is smaller, p or g, to pair with whichever is smaller, 
p' ord’. This determines, logically, whether a or b or c or d becomes a part 
of the formula. Thus one can have c — 99 (as given), or b — pp’, each of 
which will yield a positive r for positive correlation or a negative r for 
negative correlation, or one can have a — q'p or d — qp', each of which 
will yield an r with sign opposite to its true sign. (It is, of course, here 
assumed that reading to the right on the x axis and up on the y axis means 
more of the traits.) 

We must next specify the meaning of the т, y, and zs in formula (12.6). 
As for biserial r, z, is the ordinate of the unit normal curve where q 
proportion of the cases are cut off; z, has a similar meaning for q’. The 


Table 12.3 Frequencies Table 12.4. Proportions 


- + 
+] A B |A+B 
-| € D |c+D 


A+C B+D N 


y represents the value on the base line of the unit normal curve whereq cases 
are cut off, i.e., the 2/0 in Table A of the Appendix, and x is similarly 
determined from a knowledge of q’. 

Additional terms may be added to equation (12.6), which will result in a 
closer approximation at the expense of a greater, if not an impossible, 
amount of computation. For the given formula, the solution for r involves 
determining the roots of a fourth-degree or quartic equation. Either 
Horner's or Newton's methods, as described in college algebra texts, will 
do the trick. The fourth-degree equation will yield satisfactory approxi- 
mations except when r is high. — — 

The solution of a quartic equation is not difficult, nor is it so easy as to 
lead to mass production of tetrachoric rs. Fortunately, it is no longer 
o through this tedious method for getting an approximation 
to the value of re Diagrams} are available which enable us to determine 
quickly the value of r, for any given table of proportionate frequencies. 
Anyone having as many as a half-dozen tetrachorics to compute will find 
it economical to possess a copy of these diagrams. 

The tetrachoric r is particularly useful in estimating the degree of 
correlation between Variables for which we have only dichotomized 


necessary to g 


+ Chesire, L., Saffir, M., and Thurstone, L. L., Computing diagrams for the tetrachoric 
correlation coefficient, Chicago: University of Chicago Bookstore, 1933. 
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information, but it can also be used instead of biserial r or the product 
moment r, since situations for which these two methods apply can readily 
be converted into fourfold tables by simply dichotomizing the graduated 
variables. The advantage of so estimating correlation is that tetrachoric r 
is much easier to determine (by using the computing diagrams) than is 
calculating either biserial r or the product moment r. Indeed, this fact of 
computational economy has led a number of investigators to use r, when 
product moment rs could be determined. That such a practice may be 


short-sighted economy becomes quite evident when we turn to the sampling 
fluctuation of r,. 


The standard error of r, is closely approximated by 


————S 


Table 12.5. Sampling errors of 7, and r compared 


r Orr, P p S, S, 

———— o ÁN 
.00 -50 -50 157 .100 
00 .80 .80 -204 -100 
40 :50 -50 .130 .084 
40 -80 .80 .182 .084 
-60 .50 -50 115 :064 
.60 .80 .80 .150 064 
-80 -50 -50 .073 .036 
-80 .80 80 .095 .036 


A ANI Le O 
It can readil 


fact, even for 


the same degree of sampling 
tetrachoric as for a product moment correlation coefficient. 
uts and low correlations, four times as many cases are needed 


stability for a 
For .80-.20 c 
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to have comparable sampling errors. For high correlations and also for 
more extreme cuts, r; compares still less favorably with r. 

The foregoing discussion and further study of formula (12.7) lead to two 
obvious conclusions. 

First, the increasing sampling instability of r, as the dichotomies become 
more extreme warns us that, unless N is large, we cannot place much 
reliance on r, for cuts more extreme than .10-.90; seldom will N be large 
enough to warrant confidence in a tetrachoric based on cuts more extreme 
than .05-.95. 

Second, in using r, instead of the product moment r when the latter is 
calculable, we are always throwing away the equivalent of more than half 
the available data. Thus the computational economy may be an expensive 
luxury—it is very doubtful whether the calculation of a product moment r 
for N cases will ever require anything but a fraction of the expense of 


securing data on the additional N cases needed to counterbalance the 
greater sampling error incurred in using the tetrachoric coefficient. 

As in the case of r,, no r to 2 transformation exists for handling the 
sampling errors of high tetrachorics. For testing the null hypothesis, thatr, 
for the universe is zero, we may use a simpler expression for its standard 
error, namely, S,, = v, pap Inv. N. Another method for judging the 
significance of the correlation computed from a fourfold table will be 
presented in the next chapter. 

The use of tetrachoric r is circumscribed by an assumption that the 
underlying correlational surface is of the normal type. Among other things 
this implies (1) that the dichotomized traits are continuous and normally 
distributed, and (2) that the regressions are linear. Although, as discussed 
in connection with biserial r, we are usually ignorant of the tenability of 1, 
this ignorance can be partially overcome if the correlation is regarded as 
that which would obtain if the traits were normalized; i.e., it can be 
argued that the use of tetrachoric r automatically normalizes the distribu- 
tions. It is not so easy to dispose of assumption 2, since the normalizing of 
variables will not necessarily lead to linearity of regression. The only 
consolation here is that measured psychological traits are usually linearly 


related, if related at all. 


FOURFOLD POINT CORRELATION 


If we can safely assume point distributions for both dichotomous vari- 
ables, a descriptive measure of correlation can be obtained from a fourfold 
table (Table 12.3) by 

ns BC — AD 
Ма + вус + руа + СУВ + D) 


(12.8) 
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or from the table of proportionate frequencies (Table 12.4) by the exact 
equivalent 


c — 9 
v pap'a' 
The fourfold point correlation coefficient is frequently referred to as the 
phi coefficient and designated by ф. Actually, it is the product moment 
correlation between the two variables each scored in a point fashion (say, 
0 and 1). Unlike the point biserial, r, can be unity but only when p = p’. 
Otherwise (i.e., in nearly all situations) r, and r, from the same table will 
differ in value, with r, being lower, and the difference between the two 
becomes greater as the dichotomy for either variable, or both, varies 
farther and farther from 50-50. 

A few examples will illustrate the difference in the magnitude of r, and 
rą It is possible to have a fourfold table with 50-50 and 50-50 cuts which 
yields an r, of .50 and an r, of .32, and a table with 16-84 and 16-84 cuts 
which yields an r, of .50 and an r, of only .26. For similar tables (asregards 
cuts) we may have r, values of .59 and .52 when r,is.80. Thus, r, is not 
interpretable on the same scale as r, (or r or гь) as a measure (terminal 
descriptive statistic) of the degree of relationship. 

However, r, is useful (and necessary) in certain analytical work. If 
variable U and variable V were dichotomous and each scored as 0 and 1, 
then r, would be the appropriate value to use in formula (9.9) to obtain the 


variance of W, defined as U + V. If formula (5.5) for the standard error of 
the difference between correlated 


formula (6.8), 
mental tests. 
For testing whether r, deviates si 


gnificantly from zero we may safely use 
Му аз its standard error when N is not small. 


ES 


(12.9) 


proportions were written analogously to 
7, would be used. It is also used in the statistical theory of 


CONTINGENCY COEFFICIENT 


The contingency coefficient is a measure of the degree of association or 
correlation which exists between variables for which we have only categor- 
ical information. The number of categories can be such as to provide a 
2 by 2 table (as for tetrachoric correlation) or a 2 by 3, ora 3 by3,ora 


3 by 4, or a 4 by 4, or a k by I table. This coefficient is stated in terms of a 
quantity known as 7? (chi square) thus 


CIEN BE А 
N+ 
2 
p=z50 Ð 
E 


(12.10) 
where 


(12.11) 
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in which O is the observed frequency (not percentage) and E is the 
expected frequency for a given cell. In a 2 by 3 table there would be six 
cells, hence six values summed to get 72. The expected cell frequencies for 
the contingency situation are those frequencies which would exist if there 
were no association or relationship between the given variables. It can 
thus be anticipated that, the larger the discrepancy between expected and 
observed frequencies relative to the expected, the larger the value of 7? and 
consequently the higher the value of C. 

An example will help to clarify the preceding. Suppose that we have 
two variables, each of which yields three categories or classifications, and 


Table 12.6. Contingency table 
Low Medium High 


5 45 100 
(20) (60) 


College 


110 200 


High school 50 
(40) (120) 


145 200 


Grade school | 45 
(40) (120) 


100 300 100 500 


that the observed frequencies are as given in Table 12.6, which also con- 
tains the expected frequencies in parentheses. (Fictitious data; marginal 
frequencies arranged so as to simplify exposition.) In order to ascertain 
the expected frequencies needed in the computation of 2°, we ask what 
cell frequencies would be expected if there were no relationship, or zero 
association, between the two variables. Consider the 100 classified as 
college; if no association existed, we would expect that these 100 would be 
distributed according to а 1, 3, 1 ratio, i.e., in the same ratio as the 
marginal frequencies at the bottom. Thus the expected cell frequencies for 
the top row of cells would be 20, 60, 20. The expected frequencies for the 
middle and bottom rows of cells should also be ina 1, 3. 1 ratio. Both these 
rows would have expected frequencies of 40, 120, 40. 

It will be noted that (1) the expected frequencies for the columns follow, 
as they should, the ratio of 1, 2, 2, i.e., the ratio of 100, 200, 200 for the 
marginal frequencies on the right; (2) the expected frequencies sum to the 
same marginal totals as the observed frequencies; and (3) the expected 
frequencies actually exhibit a zero relationship between the two character- 
istics. 
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In practice, the computation of the expected frequencies can readily be 
accomplished by either of two schemes: (1) express the marginal totals 
along the bottom as proportions of the total N, then multiply each of the 
frequencies on the right margin by each proportion in turn, entering the 
resulting product in the cell common to the two marginal figures involved 
in the multiplication; or (2) multiply any frequency on the bottom margin 
by any frequency on the right margin, and then divide this product by N; 
the result is the expected frequency for the cell common to the two mar- 
ginals involved in the products. 

The computation of 7? is now a routine matter. We simply take each 
cell in turn, Square the difference between the observed and expected value, 
and divide by the expected frequency. Thus we have 


(5 — 20720 = 11.25 

(45 — 60)°]60 = 3.75 
(50 — 20)?/20 = 45.00 
(50 — 40/40 = 2.50 
(110 — 120/120 = .83 
(40 — 40)2/40 = 00 
(45 — 40/40 = 62 
(145 — 120/120 = 5.21 
(10 — 40)?/40 = 22.50 


The sum of these quantities, 91.66, 


is 22. To get C, the coefficient of 
contingency, 


the value of 7? is substituted in formula (12.10), thus 
Tm i 9L66 _ 
500 4- 91.66 


This strength of association is not 
same d 


39 


Bories. The upper limit for a 2 b 
for a 4 by 4 table, үз 
V(k — 1)/k. The exac 
2 by 4, 3 by 4, are unkn 
to his own satisfaction 
reader will also note th 

Despite having varyi 
decided advantage ove 
involving the nature o 


y 2 table is Vi; for a 3 by 3 table, V3; 
; for a 5 by 5 table, V$; fora К by k table, 
t upper limits for rectangular tables, such as 2 by 3, 
own. (Asan exercise, thestudent might demonstrate 
the upper limit for 2 by 2 and 3 by 3 tables.) The 
at C can never be negative. Я 

ng maximal values, contingency coefficients have a 
т Other measures of relationship; no assumptions 
f the variables need be met—continuous or discrete 
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variables, normal or skewed or any shaped distributions for underlying 
traits, orderedorunordered series, and combinations thereofare permissible. 

Disadvantages are that any two contingency coefficients are not compar- 
able unless derived from tables of the same size, that they are noncom- 
parable to product moment rs (and estimates thereof) unless certain 
corrections are applied, and that the formula for sampling erroris unwieldy. 
The necessary corrections and the sampling error formula may be found in 
Kelley, but before consulting Kelley, the reader might bear in mind the 
following comments. 

In regard to the corrections, the first is for number of categories. The 
additional correction to make C an estimate of r involves the assumption 
that the underlying traits are continuous and normal in distribution. 
Furthermore, this correction is very tedious to make. It is suggested that, 
if the assumption of normally distributed continuous variables is tenable, 
we are justified in reducing a contingency table of more than four cells to a 
2 by 2 table and then determining the value of tetrachoricr. When reducing 
to a fourfold table, we should combine adjacent categories so as to have 
dichotomies as near to .50-.50 proportions as possible. The combination 
should not be made on the basis of the pattern of cell frequencies, since this 
is likely to involve a capitalization or decapitalization on chance. We 
might take several or all possible fourfold combinations, thus securing 
several tetrachoric rs which may then be averaged. 

As to the unwieldy sampling error formula for C, it is suggested that 
insofar as we wish simply to test the null hypothesis, i.e., that there is no 
relationship between the two given variables, we need only enter the 
value of 5? into an appropriate probability table to test its significance. 
If z? is significant, the relationship is significantly greater thanzero. This 
use of y? will be discussed in Chapter 13. 

Chi square for a fourfold table can be readily obtained by formula 
Without first computing expected frequencies. Thus for a set of frequencies 
like that of Table 12.3 we have 

e cr 
х= + BXC + DXA + CXB + D) 
This resembles formula (12.8). In fact, there is a relationship between the 
fourfold point coefficient (r,), 7°, and C: 


Other measures of association or of correlation between attributes have 
been advocated. This is not the place to argue the pros and cons of these 


1 Kelley, T. L., Statistical method, pp. 266-271, New York: Macmillan, 1924. 
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other measures. It seems to theau 


thor that the measures we have discussed 
are the more defensible. 


THE CORRELATION RATIO OR n (ETA) 


nce of the variable being 
If the array means fail to fall on a 
at better prediction can be made by 


predicted by a linear Tegression line, 
straight line, it can rightly be argued th 
using a curve which really **fits" the 


the vertical arrays may be labeled S? ,, for the horizontal arrays, S?,,. 
The correlation ratio, 11, in te 


rms o 
predicted from Ys is defined as 


2 ау 
Tye = 1 — 


(12.12) 
S? 


(12.13) 


Are two 7s necessar 
the mean is smalle 
deducible from th 


from an arbitrary origin. If АО coincid 
Уа?; if AO does not coinci 


ed herein that the variance about 
er point, but this fact is readily 
ula for S in terms of deviations 


d when the regression is linear; hence it is more 


generally applicable than the product moment coefficient Which is useful 
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only in the special case where the assumption of linearity is tenable. The 
correlation ratio, however, does not enter into the regression equation 
constants. 

Even if the regressions were exactly linear for some defined population, 
a given sample would show deviations from linearity, and therefore 7s for 
successive samples would show chance sampling deviations from r. By 
how much must exceed r before we suspect curvilinearity? The only 
adequate statistical test for answering this question involves the analysis of 
variance technique and hence is postponed to Chapter 15. 

Another definition of 7 can be had by starting with the proposition that 
the variance S?, can be broken down into components, a predictable and 
an unpredictable part, or S°, = S ny + S? in which OP as is the variance 
of the array means weighted for the number of cases in the several arrays. 
Then we have 7 defined as 1/5, = S°m,/ S° and also as Pay = S5. SPa 
These are analogous to r? = 5°,/5°, and r? = S?,/S*., and accordingly we 
may interpret 7%,, as the proportion of Y variance explained by or associ- 


ated with variation in X. 
Since the 7s are most readily computed by methods to be developed 


later (pp. 278-80), no illustration will be given here. 


RANK CORRELATION 


uently resorted to when no measuring 
One measure of relationship between 
ven by p (rho), the 


_ Rank-ordering by judges is freq 
instrument is available for a trait. oft 
Variables for which we have individuals ranked is gi 


Spearman rank-difference correlation coefficient: 
Xp? 
— 28 (12.14) 
N(N? — 1) 
in which D is the difference between an individual’s two ranks (for the two 
traits). When we have ranks for one variable and scores for the other we 


can use the scores as a basis for setting up ranks for the latter, and then 
Compute rho. 

Whenever rankings on a given variable invo 
distinguish between two or more individuals or t 
are such that two or more persons have the same sc! 
between individuals who are in tie positions. Sup 
been assigned and that two individuals are tied for the fou 
they were distinguishable, they would use up ranks 4 and 5, so we assign 
each a value of 4.5. Had three persons tied for this position, we would split 
ranks 4, 5, and 6, giving each a rank of 5. Then when we proceed to the re- 
maining individuals we must remember that rank position 6 has been used. 


lve ties (the judges fail to 
he scores used for ranking 
ore), the ranks are split 
pose three ranks have 
rth position. If 
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The computation of rho is illustrated in Table 12.7. The fact that 
the algebraic sum of the Ds must be zero can be utilized as a means of 
checking the D column values. 


Table 12.7. Computation of rank-difference 
correlation coefficient 


Ranks Differences 


Persons Ist 2nd р D? 


A 3 1 2 4 
B 4 2 2 4 
C 10 10 0 0 
D 8 45 3.5 1225 
E 5 6 -1 1 
F 9 11 2 4 
G 1 з -2 4 
H 2 7 —5 25 
L 13 13 0 0 
J П 45 65 4225 
K di 8.5 —1.5 2.25 
L 6 8.5 —25 625 
M 12 12 0 0 


To test the significance of tho, for N of 10 or more, we may safely use 
|N 2 
t=p 
1—p 
with № — 2 degrees of freedom. 
tical advantages inherent in r, and 
servations on one or both variables 


Because of judgmental difficulties in 
order data are apt to be confined to 


oposed another measure, designated T 
thods, London: Griffin, 1948. 
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(tau), for use with ranks which is superior to rho insofar as testing signifi- 
cance is concerned when N is very small. As a measure of the degree of 
relationship, tau, like rho, has the property of being unity for a perfect 
relationship; for zero and near zero correlation these two measures tend to 
be alike numerically, but for other degrees of association tau tends to be 
lower than rho—at times only two-thirds the magnitude of rho. Thus tau 
is not comparable with rho (and r), and furthermore there seems to be no 
specifiable way of estimating one from the other. For a much more 
adequate discussion of both tau and rho, the reader is referred to Kendall. 


THE DISCRIMINANT FUNCTION 


Suppose we have two or more variables (measured in a graduated 
fashion) which we wish to combine into a total score for the purpose of 
discriminating between two groups. The question arises as to how best 
weight the variables so as to obtain maximum difference between the total 
Score means for the two groups. This difference must be considered 
relative to the within-groups variability; otherwise we could easily produce 
a large numerical difference by the simple operation of summing the scores 
and multiplying by a large constant, whereas the real purpose is to have 
score distributions with the least amount of overlap for the two groups. 
We want the difference to be maximal relative to the spread of scores 
within the groups. А . 

The simplest way to determine the weights for the several variables is to 
compute the fs, thence the Bs, as in the multiple regression problem. For 
this purpose, the product moment correlations among the two or more 
independent variables are calculated, and the point biserial r is calculated 
between each independent variableand д1, the dependent variable (member- 
ship in one or the other of the two groups, with one of the groups consist- 
ently designated as corresponding to the first category for the biserial 
os i ini timum relative 

Actually, since the problem here is that of ascertaining op c 
Weights rather than fitting a regression plane, we need not ca eu ate 
the A of the regression equation nor worry about S; (= icy rcm m 
Setup). The weights may be taken simply as Pal Ss. Ba[ Ss, etc., all multiplied 
by a constant so chosen as to have weights which exceed, say, 10—thereby 
avoiding decimals. Some of the weights may be negative, according to the 
Sign of the corresponding f. If all or a majority of the weights are negative, 
the signs of all may be reversed. The relationship of the total of the opti- 
mally weighted scores to group membership is describable by the multiple 
r computed by equation (11.12). Such a multiple r is the point biserial 
between the total weighted scores and belonging to one or the other of 
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the two groups. Or we may compute the weighted scores for all N cases 
and then make distributions for the two groups separately in order to 
scrutinize the amount of differentiation (or overlap) present. 


CORRELATION OF SUMS (OR AVERAGES) 


There are times when it is useful to have a formula for the correlation 
between two variables, each of which is made up as the sum (or average) of 
two or more variables. As an introduction to the problem, consider the 
situation in which one variable is obtained by summing three parts, the 
other by summing two. In deviation units, let 


2—2, +2) +2, and у=у + yy 


Then 
py = 299 X zy nya + yy) 
NS,S, NS,S, 
= Va + Улун  Yny, + Eryn + Exqna Ys 
NS,S, 


Each term in the numerator wh 
product of two Ss, and the v 
Square root of the variance o 


en divided by N will yield an г times the 
alue of 5, and S, will each be given by the 
f a sum of correlated Scores, hence we have 


= 4554 + rugs. Sn + "4554 + ть + TeaS S4 4,8,5 
VS, + 5°, + S® 4 2r 


8.5 + 2r, S.S, + 2r, S.S, 


Voy 


Sea ee ee 
x VS, + 5°» + 2ragS485 


‚ the correlations 
and the cross-correlations of 
© SIX numerator terms), This means that if 
able it is possible to obtain the correlation 
Tes without ever com uting a sum score for 
any of the N individuals. in 


Suppose x is made u 


P of m variables and y of M vi 
necessarily equal to M): 


ariables (m not 
nc ELI TI 
y = 54 t Ys ty, pee. 


& mM terms of the type 


i і ^77, М. These are the cross- 
Under one radical sign we will have m variances of the type 


¿Sr with i = 
correlations, 
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S?, plus either m(m — 1)/2 terms of the type 2r,,5,S, with i A jor m(m — 1) 
terms like r;;S;S;, and under the other radical we will have M variances of 
the type S?, and M(M — 1) terms of the form ғ;7515 with J 5 J. Instead 
of using 4- signs to indicate the addition of the various terms, we can use X 
to indicate the adding process. Accordingly, a general formula for the 
correlation of two sums (or averages) can be written as 


и Уг5{51 2j) (12.15) 
EzEy A = = 2; 
JES? + rS, 25°, + ErpjSi$y (1 9 Ј) 


in which each гуу term and each гуу term is added twice in the summing. 

The r between Er[m and Xy[M will also be given by (12.15), since 
dividing by a constant does not change the degree of correlation. That is, 
the correlation between averages is the same as the correlation between the 
sums entering into the averages. 

Formula (12.15) can be written in a different way if 
sum can be replaced by the number of items being sum 
their mean; e.g, EX = N times the mean of the Xs. Thus each of the 
sums in (12.15) can be replaced by the appropriate mean times the number 
of terms being summed. Using an overhead bar to indicate a mean, we 


have 


it is noted that any 
med, multiplied by 


= mMraSS (12.16) 
Vms? + т(т — 1)ryS.S,V MS? + MM — 1)гу515у 


ecial case of formula (12.16). Suppose that all 
it Y and that all M of the У; are 
ion of parallel measures, the 


"узуу = 


Consider now a first sp 
m of the X, are parallel measures of tra 
parallel measures of trait Y. By definit 


following hold: 


(1) 5, = 5 = == Sa 
(2) S, = Spar = Sp = = 5м 
(3) all r;; are equal to one another 
(4) all гуу are equal to one another 


and we would expect 
(5) all гуу to have the same value 


Under these conditions, the S; will cancel, the 57 will cancel, #,; will be the 


same as any one Fij Fry will equal any гуу, and 7,; will equal any г. This 


leads to 


mMri;; (1217) 
REY үт + mm — Dr / M + M(M — Dri; | 
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Dividing both numerator and denominator by mM, we find 


lir 
Гар 
1 ( Y E ( L) 
= 1——]n;/— 1——)r 
"Im m "i MT M! PH 


Next consider what happens as we allow both m and M to approach infinity. 
Both 1/m and 1/M become zero, leaving 1 times r,, and 1 times rj; under 
the radicals. Thus the correlation between two sums (or averages), each 
based on an infinite number of parallel measures, becomes simply 

$ 


r; 
Грау = Fes ру = ——H (12.18) 
т М DNI 


But since r,; and rzy are each correlations between parallel measures, each 
is a reliability coefficient; and Г is simply r,,, the correlation between 
any X; and Ү;. Therefore, for m and М infinitely large, 


ж ж Voy 
тм aeu 
which is, as might have been anti 
formula (10.20). 

As a second special case of (12.16), suppose M = m and the X, and Yr; 
are all parallel measures of just one trait, say X. Again the Ss will cancel, 
but now гу = rj, = li = "zx, and with M = m formula (12.17) becomes 


(12.19) 


cipated, the correction for attenuation, or 


mr. mr, 
Тұзуу = 


= E (12.20) 
m+mm— 1)ғ,, 14 (m — Dr, 


If m — 2, this becomes 2r. l(l +r 
Spearman formula (10.17) fo 
In other words, formula (1 
formula for a test increased 
reliability r.., and we wish t 
the test to achieve a reliabil 
with гушу, set to .90: 


zz), Ог the previously derived Brown- 
т the reliability of a test doubled in length. 
2.20) is the generalized Brown-Spearman 
m-fold in length. If a test of given length has 
О know how much we would need to lengthen 
ity of, say, .90 we simply solve (12.20) for m, 


_ :90(1 — ra) 
rz(l — .90) 


m 


| 
| 


Chapter 13 


FREQUENCY COMPARISON: 
CHI SQUARE 


The quantity chi square (7), defined in the last chapter as 


T 2 
2-20 (12.11) 


Or as the sum of the squared discrepancies, between observed and expected 
frequencies, each divided by the expected frequency, is a statistic very 
useful in a variety of problems involving frequencies. Let us begin by an 
examination of what might be expected to happen if a penny were tossed 
100 times. The expected frequency for heads is 50, and for tails is also 50. 
If for a particular series of tosses we secured 55 heads and 45 tails, the 


discrepancies would be +5 and —5. When these discrepancies are squared, 


each becomes +25, and dividing each squared discrepancy by the expected 


value we would have .5 + -5 = 1.0 as the value for y? Had we obtained 
40 heads and 60 tails, the discrepancies of —10 and +10, when squared 
and divided by E, would give 2 + 2 = 4 as 2. . 

Three things are readily apparent from the aforementioned : first, the 
greater the discrepancy relative to E, the greater the contribution to 22; 
second, the two parts being summed to obtain x аге not independent— 
when the absolute discrepancy for heads is known, that for tails can be 
inferred to be the same; and third, the squaring process means that x? is 
always a positive quantity regardless of the direction of the discrepancies. 
A fourth fact becomes apparent if we recall what happens when a series of 
tosses is repeated. The number of heads (or tails) secured will vary from 
one series of 100 tosses to the next; hence the amount of discrepancy will 
vary, and therefore the magnitude of y? will vary from series to series. In 

209 
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other words, successive sampling will yield varying values for x°. If we 
knew the sampling distribution for 7?, we could specify the probability of 
securing by chance as large a value as any obtained 7, and thereby we 
could judge whether a given amount of discrepancy is significantly large 
enough to warrant the conclusion that the coin is biased. . 
Situations similar to this arise in research work. We may, on the basis 
of a hypothesis that a certain proportion of individuals possess a given 
characteristic, state how many of a sample of N cases would be expected to 
show the characteristic. Observations on N cases will provide an observed 
number. If the hypothesis is tenable, the discrepancy between observed 


and expected should be no larger than might arise on the basis of chance. 
Ifthe obtained discrepancy is too large, i.e., not apt to arise by chance, the 
hypothesis becomes Suspect. The student who recalls that the standard 
error of a proportion can be used in comparing observed with expected 
proportions may wonder whether another technique is necessary. The 
answer will be forthcoming. 


CHI SQUARE AND THE BINOMIAL DISTRIBUTION 

Perhaps some insight regarding the sampling distribution of 7° can be 
obtained by a re-examination of the binomial distribution, which was 
discussed in Chapter 5. Suppose we consider the binomial distribution, 
(p + q)? with p =q = 1as yielding the chance distribution of number of 
heads when 10 unbiased coins are tossed (see Table 13.1). When 10 coins 
are tossed we expect to get 5 heads and 5 tails, that is, the Es are 5 and 5, but 


Table 13.1. The binomial and X? when 10 coins are tossed 


Number of 
Heads fo x f for x? 

10 1 10.0 2 
9 10 6.4 20 
8 45 3.6 90 
7 120 1.6 240 
6 210 0.4 420 
5 252 0.0 252 
4 210 0.4 
3 120 1.6 
2 45 3.6 
1 10 6.4 
0 1 10.0 


1024 1024 
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for a particular toss we will have an observed number of heads (and tails) 
which may differ from 5 and 3. The observed values, or Os, could be 10 
heads and zero tails; 9 heads, 1 tail; and so on to zero heads, 10 tails. 
If we obtained 9 heads and 1 tail, we could write 7? = (9 — 5)/5 
+ (1 — 5)2/5 = 6.4. Similarly, if we compute 7? for 10 heads and no tails 
we get a value of 10.0; for 8 heads and 2 tails we get 3.6; etc. Note that 
for each x2, ХЕ = XO = 10. 

The third column of Table 13.1 gives the values of 7? for various possible 
sets of observed frequencies for number of heads and tails. All the given 
numerical values of 7, except 0, appear twice: 9 tails and 1 head will 
obviously lead to the same > as 9 heads and 1 tail. Now the probability 
of obtaining 9 heads and 1 tail is 10/1024 and the P for 1 head and 9 tails is 
also 10/1024; hence the P for obtaining a 7? of 6.4 is 20/1024. Likewise, 
we may combine the appropriate binomially derived chance frequencies 
(f,) so as to write the chance frequencies for the several y? values. These 
appear as the fourth column of the table. We have thus established the 
chance or probability distribution of 7? for a specified coin-tossing situ- 
ation. A plot of these frequencies against the 7? values will reveal a highly 
Skewed distribution. 

The probability of а 7? as large as 6.4 will be 20/1024 + 2/1024, or 
22/1024, a value which obviously represents the probability of a discrep- 
ancy, between O and £, as great as 4 in either direction (at least 9 heads or at 
least 9 tails). The P of 22/1024 involves 1 tail of the distribution of 7? values, 
but both tails of the binomial contribute thereto. This fact will need to 
be recalled below when we discuss one- vs. two-tailed tests of hypotheses. 

Before we leave Table 13.1, it might be well to point out a connection 
between у? and а/о. Consider again an obtained frequency of 9 heads. 
If we express 9 as a deviation from the mean of the binomial, np = 5, 


relative to the o of the binomial, V/npq = 1,581, we have 4/1.581, which 
when squared gives 6.401 or the corresponding value of 2 (within limits of 
rounding error). This agreement is not accidental; as will be seen shortly, 
under specifiable conditions £ = (loy. Another characteristic of 7? is 
obvious from Table 13.1: for the 10-coin situation no values of 7? other 
than those given can be obtained because the possible number of heads 
(and tails) is a discrete series. This lack of continuity imposes a restriction 
on the use of x? which will receive more attention as We proceed. 

The z? values in Table 13.1 are for possible discrepancies of observed 
frequencies from an expected frequency of 5 for a single toss of 10 coins. 
Suppose that we have, as shown in Table 13.2, an observed distribution of 
frequencies obtained by tossing 7 coins 1000 times, and that we wish to 
compare these observed frequencies with those expected on the basis of 
the binomial expansion. We are not concerned this time with a single toss 
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for which the expectation would be 3.5, but rather with the results мма 
when a large number of tosses аге made. Note that both the E column an 

the O column sum to 1000 (or N) and that the (O — E)ssum tozero. The 
several contributions to 7? are given in the last column, which sums to 
7.65, or the у? for the entire table. Two other series of 1000 tosses made by 
students in the author's classes yielded 7? values of 12.52 and 15.02. Two 
of these values for 7? are larger than any of the values in Table 13.1, and 
one reason for this is the fact that more (О — E/E terms are being 
summed—8 such values instead of 2. Thus, the possible magnitude of a у 


Table 13.2. x? for discre 


Pancies of expected and observed 
frequencies when 7 


coins were tossed 1000 times 


Number of (О — Ey 

Heads E [2] O-E E 
T 8 4 —4 2.00 
6 55 55 0 .00 
5 164 157 -7 .30 
4 273 283 10 «37 
3 273 267 —6 13 
2 164 177 13 1.03 
1 55 45 —10 1.82 
0 8 12 4 2.00 

Sums 1000 1000 0 7.65 

(N) (N) C 


22) 
Would seem to be а function of two things: : 
discrepancies (relative to their respective Es) and the number of categories 
9r possibilities for dis 


DEGREES OF FREEDOM 


We have seen that the # of 6.4 in Table 13.1 involves two (О — ЕЕ 
values: (9 — 5)?/5 a 


nd (1 — 5)?/5, or two discrepancies of exactly the 


n as one is calculated, the other can be written down 
further calculation; hence 1 degree of freedom exists. 
a of Table 13.2, we see that, since the discrepancies 
ll eight cannot be independent or vary freely. As soon 


at once without any 
If we study the dat: 
must sum to zero, al 
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as seven are known, the eighth is determined. This means that there are 7 
degrees of freedom for this situation. If we were to roll a die 600 times and 
then compare the observed frequency for 6 spots, 5 spots, etc., with the 
number expected on the basis of a perfectly homogeneous (unloaded) 
cube, we would have five possible independent discrepancies, or 5 degrees 
of freedom. In each of these situations the expected frequencies are deter- 
minable on the basis of some a priori principle, and the only restriction is 
that the total expected frequency must be the same as the total observed 
frequency, i.e., Ng must equal No. In all such cases the number of degrees 


of freedom (df) is 1 less than the number of categories. 
Table 13.3. x? and fourfold table 


(Expected frequencies in parentheses) 


No Yes Totals 
Group 1 50 (40 50 (60 100=№ 
Group 2 70 (80) 130 (120 200 = № 
Totals 120 180 300 = N 
№ N, 


ме Ес 
The df for other situations in which the 7? technique is applicable will 
follow the same principles as to the number of independent discrepancies, 
but not the rule just laid down. Suppose we consider a 2 by 2 or fourfold 
table such as that given in Table 13.3 (which contains fictitious data for 
purpose of ease in exposition). The expected frequencies are set up on the 
assumption that there is no difference between the two groups (the null 
hypothesis). If this were the case, we would expect that the 180 yeses 
would be distributed in the 1 to 2 ratio of the right-hand totals; likewise 
the 120 noes. Note that the expect 


ed frequencies reading across, i.e., 
40 and 60, and 80 and 120, are proportional to the marginal totals at the 
bottom. In determining the df, we can observe either of two things: first 
that all four discrepancies have the same absolute value, so that when one 
is known the other three can be written down at once; or second, that in 
Setting up the expected frequencies, we are restricted by the requirement 
that the two top-row values must sum to Nj, the next two:must sum across 
to Ng, the left-hand column must sum to N,, and the next column to N,; 
as soon as the value 40 has been ascertained, the remaining three expected 
values become fixed. Either way we look at the situation, we see that there 
is but 1 degree of freedom even though there are four cells or four dis- 
crepancies. 
The fundamental question is: How many of the discrepancies are 
independent? In practice this can be answered by determining how many 
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-of-thumb for ascertaining the 
contingency-type tables of k rows and — 
are utilize in setting up the expected frequencie 2 
is to take df= (k рур 1). Thus for the fourfold table we hav 
1, and for the 3 by 3 table, G — 1) — 1) = 4, ete. 
ot be square; in fact, very often the psychologist wishes 


asis of k possible responses to a T 
For this k by 2 table, the df becomes (k — D(2 — 1), or simply k — 1. 


SAMPLING DISTRIBUTION OF x? 


? for various degrees of freedom, we can 
Specify the probability of obtaining ag? " 
ervations do not agree with 
T more groups differ significantly or 


for 1 degree of freedom, the distriBEHOD 
general equation for the 7? distribution 

herefore there is no one 7? distribution but a 
tions, one for each value ofn. It happens Mem 
practical work seldom involves more than 30 degrees of freedom, so tha 
Ves with all possible distributions. Curves for 


drawn for various ns with x” along the abscissa 


of 7 is the same as for (z/G?, The 


we need not concern Oursel 
the distribution of 7? can be 
* 


1 
И = wn Tp 


(%2) -212 e-xt is 


in which T indicates the gamma function as defined in texts in advanced calculus, 
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Fig. 13.1. Chi square distributions for various dfs : x° along abscissa. 

the equation in the footnote. 
s in the unit normal curve. 
lues of п or df, so drawn 
curves and their 


and the ordinates as the y values obtained by 
The area under each curve will be 1 unit, à 
Figure 13.1 contains curves for seven different Và 
as to be comparable. Note that the shapes of these. 
general locations along the abscissa vary with zt. 
For n — 1, or for 1 degree of freedom, the cu 
(strictly speaking, it is asymptotic to the ordinate and 
infinity) and drops quite rapidly. For this curve the height or y value at 


rve starts very high 
hence starts at 


216 PSYCHOLOGICAL STATISTICS 

Z? = .16 is .92 (not shown). At 7? = 01, the height is more than four 
times greater than .92. By the time we reach a # of 1.00, the height is 
-242 (what x/o value does this height correspond to when the unit normal 
curve is considered?). Then the Curve trails off until, at у? = 6.25, the 
height is about .007. Regardless of n, the right-hand parts of the curves 
never reach the base line; Le., they are asymptotic. If we think of the total 
area under any curve as unity, the area between ordinates erected at any 
two base-line points, or the area beyond any point, can be expressed as à 
Proportion of the total. Thus, for n = 1, .99 of the area is beyond (to the 
right of) a 7? value of -000157, and only .05 is beyond 3.841. Stated 
differently, the probability of obtaining a 7? value as large as 3.841 is .05; 
for y^ as large as 6.635, P = 01 ; and the P = .001 point isata x? of 10.827. 
These hold only for df = 1. 

The curve for n = 2 starts at a height of .50 and then descends, but less 
rapidly than that forn = 1. It is readily seen that large values for y? occur 
more frequently when n — 2 than when n — 1. The P — .05 point is at 
5.991; ie., the probability of obtaining by chance a 7? value as great 
as 5.991 is .05. The .01 point is at 9.210, and the .001 point is at 
13.815. 

For n = 3, the distribution curve begins at zero height, rises sharply to 2 


maximum (modal value) at у? = 1, and then falls off so that the P — .01 
point is at д2 = 11.341. Asn ist 


rn = | ue is at a 7? of n — 2. Я 
The distributions of # for Varying ns are theoretical probability distri- 
butions. They may be in 


‘erpreted as random sampling distributions, and 


: erence is 1.96 times its standard error, the null 
hypothesis becomes Suspect; if 2.58 times its standard error, the hypo- 
thesis of no difference can fairly safely be rejected; if D/Sp = 3.00, 
dicated. These three zs, it will be recalled, 

‘Ul, and the .003 levels of significance, for two- 
tailed tests. 

Now %? can likewise be used to test the null hypothesis. The essential 
difference between the D|S and the х? techniques is that the latter involves 
skewed probability distributions; but, knowing the distribution for а given 
п, We can ascertain the necessary value of д? for the .05, the .01, the .001, or 
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other levels of significance. The statement of the null hypothesis in connec- 
tion with 7? may vary slightly according to the given situation. If the fre- 
quencies in the universe agree with the a priori expected frequencies, if the 
frequencies in two or more universes are the same, if there is zero associa- 
tion in the universe between two classifications or variables—if any such 
conditions hold for the universe or universes, then successive samplings 
will yield y? values which will distribute themselves in a determinable 
manner, thus permitting us to specify the probability of obtaining by 
chance a y? value as large as any given or obtained value. When this 
probability is small, say .01 or less, the null hypothesis is rejected, and its 
rejection implies that there are real discrepancies or real differences exist 
or there is a real association. 

Since the random sampling distribution of y? depends on the df, which 
Varies from situation to situation, it is not feasible to give a rule-of-thumb 
criterion in terms of the magnitude of y? which would be deemed signifi- 
cant. If we adopt Р = .01 as the level of 
need to refer to available tables of 7? in order to find how large у? must be 
to correspond to this level; likewise for any other chosen level of signifi- 
cance. Probability tables for 1? are available in two forms. One form, 
Fisher’s (see Table D of the Appendix), gives the values of 7? which will be 
€xceeded by chance a specified number of times, such as .10, 05, .01, and 
:001. Elderton's table} gives the probabilities for obtaining chi squares as 
large as specified values expressed as integers, such as 1, 2, 3,* * *, 21, 22. 
Both tables include varying degrees of freedom. Because of an early 
erroneous notion as to the meaning of degrees of freedom, Elderton's table 
must be entered with df equal to 1 less than his n’ values, e.g., use n' = 4 


When п or df = 3. Elderton's table has one advantage over that given in 
Our Appendix: P values as small as .000001 can be ascertained. 
238 — V2n — 1 will have a 


For ns larger than 30, the expression №22 — V2n — 
Sampling distribution which will follow very closely the unit normal curve. 
The probability is accordingly .05 that this expression will exceed 4-1.64, 
and .01 that it will exceed 4-2.33, by chance. | 

Before the applications of y? are summarized, a word should be said 
about the underlying assumptions which restrict its usage. In the deriva- 
tion of the equation for the y? distribution(s) it is assumed that the 
sample discrepancies, or distribution of observed from expected, follow a 
normal distribution. In practical applications this assumption can easily 
be violated in two senses: skewed distribution for О — E values and lack 
of continuity. If Æ is small, say, equal to 2, the Os are restricted on 


significance we wish to attain, we 


} Table XII in Pearson, Karl, Tables for statisticians and biometricians, part I, Cam- 
bridge: Cambridge University Press, 1931. 
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one side of E to zero and 1, whereas on the other side the possible values 
may be 3, 4, 5, and upward. Such a curtailment ordinarily leads to а 
skewed distribution of the Observed frequencies (if the other side were 
restricted to just 3 and 4, Symmetry could exist). Now it is obvious that 
when E is small we have a greater degree of discontinuity, hence the 
sampling distribution of the observed frequencies (and therefore, of O — : 
values) will be discrete instead of continuous as required for the norma 
curve. Even for the situation involvin 
Os about E, such 
Observed 7? value: 
in the approxima 


a correction for continuity will better th 


Observed one. Even so, with df 
(correction formulas yet t 
an £ is less than 5. 


ample 7?s, will show relatively little dis- 
Saw, for a situation involving E — 5 and 
Six values, whereas for the foregoing dice 
(0 — E/E ratios permits a large number of 
d zs, or a greater approach to continuity- 
to the approximation of the discontinuous 
normal distribution, which approximation 
as n increases (or as the n + | possible 


Although discontinuit 
of normality for Os abo 
5 or 6, there remains th 


у as an aspect of the violation of the assumption 
ut small Е is not serious when the df is more than 
€ question of the effect of possible skewness when 
Es are small. There is evidence that, when df is not small, Es as low as 
2 will not produce misleading 7? values. | 

A second assumption is that the observations be independent of one 
another. This assumption is violated when the total of the observed 
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frequencies exceeds the total number of persons in the sample(s). Such an 
inflation of N occurs when multiple observations are made on each person 


and each person is counted more than once (cf. p. 93). 


APPLICATIONS 


The chief situations for which it is permissible to use x? may be classified 
into three types. | 

1. The discrepancy of observed frequencies from frequencies expected 
on the basis of some a priori principle. Such situations are most frequently 
found in genetics, wherein it is hypothesized that certain crossings should 
lead to the presence, in a certain proportion of offspring, of some defined 
characteristic or variation thereof. The frequency table for such situations 
is 1 by k, with k — 1 degrees of freedom, since the only restriction is that 
the expected frequencies must sum to N. This type of situation does not 
arise often in research in the social sciences. | | } 

2. Contingency tables. Here we have two types of situations which 
differ only in the methods of classifying. | 

а. We may have a contingency table which is analogous to a correlation 
table in that both classifications are based on continuous or ordered 
discrete variables for which we have only categorized information for N 
individuals. The two variables might be in dichotomy (fourfold table), or 
one might be a dichotomy and the other manifold, or both might involve 
multiple categories. For these contingency tables it is meaningful to 
Speak of the correlation between the two variables, and the degree of 
correlation might be appropriately specified by the tetrachoric r or the 
fourfold point r or the contingency coefficient (corrected or uncorrected); 
Which measure is used depends upon meeting the requisite assumptions. 
Insofar a« we are concerned only with 4^, we have the means for testing the 
Significance of the correlation or association as a chance departure from 
Zero or no relationship, and the significance test can be used without 
knowledge of the degree of correlation. Such a test of significance is 
Sometimes spoken of as a test of independence—are the two classifications 
independent? If so, 7? should be no larger than would arise by chance. If 
We have evidence for correlation or a lack of independence from the у? 
technique, we can proceed to calculate an appropriate coefficient for meas- 
uring the degree of correlation or the strength of association. The student 
should, as an exercise, convince himself that 7? per se is not a measure of 
association. 

b. The other contingency-type situation involves classification into 
categories for one variable vs. classification into unordered groups for the 
other, or one unordered grouping vs. another. The fundamental problem 
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is apt to be that of comparing two or more groups with regard to multiple 
responses; i.e., we want a test of the difference between groups rather than 
a measure of correlation, which would not be entirely meaningful except 
in the loose sense that a particular response is associated more often with a 
particular group. As previously stated, the df for a k by / contingency 
table is (k — 1)(/ — 1). 

3. Goodness of fit. If we wish to check on whether it is reasonable to 
believe that a given frequency distribution is, within the limits of chance 
sampling, of the normal or some other specified type, a frequency curve 
having the same basic constants (е.р., N, M, and S for the normal curve) 
as those computed from the observed frequency distribution can be fitted 
to the data. If a normal curve is being fitted, the table of normal curve 
functions is used to set up the theoretical or expected frequencies for the 
several grouping intervals. Then Z^ can be computed in the usual manner. 
The df will correspond to the number (k) of grouping intervals less the 
number of constants derived from the data and used in the fitting process. 


For the normal curve the observed and theoretical distributions are made 
‘to agree as to N, M, and S; hence df = К — 3. An attempt will be made 
later to explain the reasonin 


back of the determination of df when check- 

ing the goodness of fit of frequency curves. 
Fourfold contingency tables, For illustrative purposes, let us fir st 
apply 7? to a couple of 2 by 2 contingency tables for which the tetrachoric 
r, as well as the contingency coefficient, is an appropriate measure of the 


Table 13.4. Setup for computing x? from a 
fourfold table by means of a formula 


A B A+B 


с р C+D 
ALG Bp N 


degree of correlation. Before we 
for a fourfold table can be comp 
require calculation of the four 
frequencies and marginal totals b 
be computed from 


do this, it might be well to recall that x? 
uted by a simple formula which does not 
expected frequencies. Let the fourfold 
e set up as in Table 13.4. Chi square can 


NBG = ару RR (13.1) 
(А + BXC + DXA + СВ + D) 


This is simpler than calculation from the discrepancies between observed 
and expected frequencies. The requisite that no expected frequency shall 


x 
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be less than 5 still holds. A quick check on this can be obtained by multi- 
plying the smaller right-hand marginal frequency by the smaller frequency 
on the bottom margin and dividing the product by N. This will yield the 
smallest expected frequency. In Table 13.5 will be found two fourfold 


Table 13.5. x? applied to contingency (fourfold) tables 


Item 1 Item 3 
- + = + 
+ 29 39 68 + 34 37 71 
Item 2 ]tem 4 
= 22 10 32 = 94 35 129 
51 49 100 128 72 200 
z = 5.93 x? = 12.40 
P about .01 P less than .001 


Direct substitution into formula (13.1) 
bottom of the table. The P values are 
01, respectively. We can be reasonably 
en the first two items, and fairly 
The value of the tetrachoric r is 


tables for Stanford-Binet items. 
yields the two chi squares at the 
approximately .01 and less than .0! 
sure that there is some correlation betwe 
certain that items 3 and 4 are correlated. 
.40 for each table, and the contingency coefficient (with no corrections) is 
124 for each table. Thus we see that the у? Ps associated with the same 
degree of correlation can be different. Why? Would it be possible for two 
шон tables to yield the same z P, yet differ in the degree of relation- 
Ship? 

Another application of 7 
Which the sexes at four age 


2 to fourfold tables is given in Table 13.6, in 
levels are compared in performance on a 


Table 13.6. x? used to test sex differences in passing (+) or failing (—) a Binet item 


Age б 7 8 9 

E + — + — + == + 

84 | 18 | 102 66 | 36 | 102 58 | 44 | 102 37 | 66 | 103 
G |93| 8} 101 80 | 20 | 100 62 | 39 | 101 52 | 49 | 101 


177 26 203 146 56 202 120 83 203 89 115 204 


2 
" 4.30 5.89 43 5.02 
<.05 <.02 <.50 <.05 
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Stanford-Binet item. None of the 7? values reaches 6.635, the value corre- 
sponding to the .01 level of significance, but three of them are large enough 
to suggest a real sex difference. That a real difference may exist is also 
suggested by the fact that the boys are consistently superior at all four age 
levels. This brings us to an important property of 7?. The several chi 
squares for independent (i.e., based on different samples) tables may be 
summed to a total 7?, with df equal to the sum of the dfs for the chi squares 
being summed. Thus for Table 13.6 we have 4.30 + 5.89 + .43 + 5.02 
= 15.64 as a 7? based on 4 degrees of freedom, by which we can judge the 
significance of the over-all sex differences shown in the four tables. With 
x? = 15.64 and n = 4, we find (from Table D) that P is less than .01 (for 
n = 4, a 7? of 13.28 corresponds to the .01 level). From Elderton’s tables 
it can be ascertained that P is about .004. In other words, as great a sex 
difference, considering all four age groups, would arise 4 times in 1000 by 
chance; hence it would be concluded that a real difference does exist for 
this item. 

This combinatorial property of 7? is important for all situations where 
frequency data from different groups cannot first be legitimately combined 
because of age or other differences. It is most useful when consistency is 
present among several comparisons, none of which taken singly possesses 
Statistical significance. However, neither consistency nor insignificance for 


single comparisons constitutes a requisite for using the sum of chi squares 
as an over-all test of significance or as a means of arriving at one summary 
probability figure, 


‚А rigorous proof of the Proposition that a sum of independent ys will 
yield a new 7? that follows the # distribution with df equal to the sum of the 


dfs for the 72s being summed is beyond the level of this book. Perhaps the 
reasonableness of the propositi 


та оп can be seen from the following argument. 
e 
0, — E,)* — Ey 
BoB ED PEN C 
* b 
If we added the two 7" values to get yt = z2, + Z^, we would have exactly 


апа 7?,, we would get 
as 77, = 77, + z. Now in determining the 
degrees of freedom for the ing the ten ratios, we need 
to look at the restrictions with; ts in order to specify how 
many of the (О — E) deviati i 

dent deviations in set a pl 
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total number of independent deviations in the two sets combined. But 
this total number is simply the sum of the dfs for the separate sets. Stated 
differently, 7? is not conscious of how it was computed, whether by first 


getting partial sums which are then added or by proceeding directly to a 
total sum. 


Table 13.7. Schema for comparing groups via у? and via difference between 
proportions (or percentages) 


Frequencies 


A+C B+D N 


Proportions 
+ - 


Pi = А[М, qı = BIN, р + =10 


ps = CIN: qe = DIN; рь +4 = 1.0 


р= (4 + СІМ д=(В + DIN р+ӯ = 1.0 


The single age comparisons іп the foregoing example could, of course, 
be made by means of proportions. This could be done by formula G.6), 
the discussion of which should be reviewed at this time. Let us examine 
the connection between the у? technique and the D/Sp for proportions 
method of testing the significance of the difference between two groups, the 
individuals of which have been classified as either passing or failing, saying 
either yes or no, possessing or not possessing a characteristic, etc. All such 
comparisons begin with a fourfold frequency table of the type symbolized 
in Table 13.4, or an equivalent (the frequencies may have been recorded for 
only one category of the dichotomy, say the yeses, from which the fre- 
quencies for the other category may be readily inferred by subtraction). 
Table 13.7 contains the basic table of frequencies for the presence (4-) or 
absence (—) of a characteristic for groups 1 and 2, and the basic table of 
proportions obtained by dividing the frequencies by the proper Ns is 
indicated. Note that the p and q values on the bottom margin are the 
Proportions to use in formula (5.6) for the standard error of the difference 
between p, and p, Note also that p, = А[М, = AJ(A + B) and that 


P» = C/N, = CKC + D). 
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In order to avoid carrying along a square root sign or radical, and for 
another reason which if not now obvious will soon become so, let us write 


the square of the expression for the critical ratio of the difference between 
the two proportions, p, and рь, thus, 


D = (ру = pe)? 
S? 
D Pd + Pq 
N, N, 
When we replace all the proportions by their equivalents involving 


frequencies and the proper Ns and also substitute frequencies for N, and 
Ns, we have 


mp. [AKA + B) — CKC + рур 
S» [A+ OIN]-[G -- DN] | [(4 + ON] [B+ DIN] 
A+B C+D 
(AC + AD — AC — BC} 
_ [A + вус + рур 
(4 + CB + БУС + D) + (A+ CNB + DXA + B) 
SLE + DXC + D) + (А + Сув + D(A + В) 
МКА + B(C + D) 
А (AD — BON? 
(A+ B(C + D)[(A + C)B + рус + D) 
+ (4 + C)B + руа + B] 


= (AD — BCYN? 
(A+ BIC + D(A +СУ(В + DALB +C+D) 
D (AD — BCYN 


m BOYN 0. 
S' (A+ ВС + DA + CXB + D) 


which equals 7? as given by formula 
confirms a fact already mentioned, that for 
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Table 5.2, which is here reproduced with minor changes as Table 13.8, 
provides a means of testing the significance of the association or correlation 
between two sets of responses, such an application does not test the signifi- 
cance of change from the first to the second set of responses. This latter 
test can be made by means of formula (5.5). It is also possible to test the 
significance of any found change by the use of 2. To do this, we first note 
_ that a net change for the group must necessarily involve the difference 
between the frequencies, A and D, since the В and C cases represent those 


Table 13.8. Fourfold table of frequencies and proportions for a first set vs. a 
second set of responses from the sarne individuals 


Frequencies Proportions 
2nd 2nd 
A+B p 
C+D qı 
A+C B+D N 9 p. 10 


who showed no change. The null hypothesis would be that the universe 
frequencies are not different; i.e., for a given sample, A and D would differ 
only as a result of chance sampling. Since A + D represents the total 
number of individuals who changed (the As from + to —, and the Ds from 
— to +), in setting up the null hypothesis concerning the net change it 
would seem appropriate to say that, if A + D individuals changed, 
(A + D)/2 would change in one direction and (A + D)/2 in the other 
direction, Thus (A + D)/2 would become the expected frequency; then 
A — (A + D)/2 and D — (A + D)2 would become the discrepancies 
between observed and expected (on the basis of the null hypothesis) 
frequencies. If A = D, both discrepancies would become zero. Squaring 
each discrepancy and dividing by E and then summing the two quotients 
or doubling either one will give a x which is based on 1 degree of freedom 
(why 1 degree of freedom?). A little algebraic manipulation shows that 


yn (A— Юу 

LE PEE 13. 
D, AT D (13.2) 
for the particular situation in which we wish to test the significance of 


over-all changes. 

Comparison of formula (13.2) with formula (5.3) shows that we again 
have a 72, with 1 degree of freedom, which equals the square of an v/o. 
The reasoning back of the statement given on p. 56 that formulas (5.3), 
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(5.4), and (5.5) are inapplicable unless A + D equals 10 or more should 
now be clearer to the reader. If 4 + D were less than 10, the two Es 
would be less than 5, an acceptable though none too conservative lower 
limit for E. A correction (for continuity) needed when the Es are smaller 
than 10 will be given shortly. One thing which may puzzle the reader at 
this time is the fact that formula (13.2) does not contain a total N. Its 
algebraic equivalent, (D/op)?, with ср calculated by formula (5.5), does 
contain N, so the absence of N from (13.2) is more apparent than real. 

The advantage of the 7? over the D[o technique for testing the signifi- 
cance of net changes in responses lies in the fact that у? values for two or 
more groups which have been used in an experiment can be summed to a 
new у? with n equal to the sum of the separate dfs; in this case л equals the 
number of chi squares being summed. 

Formula (13.2) is, of cour 


1 Se, not restricted to situations involving 
changes in responses. If we h 


ave the same individuals giving, say, yes or 
uestions and we desire to test the significance 
equencies (or proportions) of yeses or noes, 


pectively. Since N — 100, 


п рег cent). By formula (13.2) 
we have 2 = (29 — 10y*/(29 + 10) — 9.26, which for 1 degree of freedom 
falls between the -01 and .001 levels of significance; hence, it would be 
i € different in difficulty. If we use 


(ps — pòlon = (.68 — 49) / C10 F .29)/100 


1 М. Yates’ correction can be incorporated 
in formula (13.1), which becomes и 


= N(BC — Ap| — мә) 


(A+ B(C + DA + CXB + D) 
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and indicates that the absolute difference between AD and BC is to be 
reduced by N/2. Formula (13.2) can also be written to include a correction 


for continuity. The corrected form 


(IA — D| — 1* 
C AB S (oe 


gems 


involves decreasing the absolute value of the difference between A and D 
by 1. Formula (13.4) is to be preferred to (13.2) when A + D is less than 
20. The reasoning back of Yates’ correction is precisely the same as that 
given on p. 45 of Chapter 5. 

One-tailed vs. two-tailed test. It will be recalled from our discussion 
of the sampling distribution of 7? that the Ps obtainable from Table D are 
the probabilities of the chance occurrence of as large a 7? as that observed; 
that is, levels of significance such as P = .05 or .01 or .001 are based on one 
and) tail of the sampling distribution of 72. Does this mean 
that it is a one-tailed test in the hypothesis testing sense discussed earlier 
(рр. 61-63)? Let us recall a couple of facts. First, when using the bi- 
nomial to indicate something of the nature of the 7° distribution we saw 
that both tails of the binomial were combined as one tail of the z? distribu- 
tion. Second, for 1 degree of freedom 7° = Quo. Now an 2/0 of 1.96 
corresponds to the P = .05 level as a two-tailed test. The square of 1.96 
gives a y? of 3.84, which we can see from Table D also corresponds to the 
.05 level. Hence the Ps, for 1 degree of freedom, read from Table D are 
equivalent to those based on the two-tailed test despite the fact that only 
one tail of the 7? distribution is involved. 

If the decision to be made or the hypothesis to be tested calls for a 
one-tailed test, the Ps from Table D need to be halved: a 7? of 5.41 
(instead of 6.64) is required for the .01 level, and a 7? of 2.71 (instead of 
3.84) gives the .05 level. Incidentally, for 1 degree of freedom, a Р 
Obviously, can be obtained by entering its square root into the normal 
curve table—whether such a P from 2/0 is based on one or both tails of the 
normal distribution depends on the hypothesis being tested. As we 
proceed, the student should convince himself that the notion of direction 
of differences, hence the idea of a one-tailed test, does not make sense in 


(the right-h 


other applications of y. 
Comparison of two or more correlated proportions. Formula (13.2) 


has been extended to provide a method for testing whether three or more 
nonindependent proportions (or sets of frequencies) differ significantly 
among themselves. For example, we may have pass-fail (or yes-no, or 
some other dichotomous) information on C items (or questions) for N 
individuals; or we may have only one item with responses from N persons 
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under C different conditions; or one item with responses from N sets of C 
matched persons each, that is, C matched groups. 

Data from such situations can be arranged in a table consisting of № 
rows and C columns. The total number of passes (yeses) in a given 
column divided by N will, of course, be the proportion of passes (or yeses) 
in that column. Do these C proportions (or the totals) differ significantly 
in an over-all sense? The null hypothesis is that all the proportions are the 
same except for chance. To test the null hypothesis we will need to obtain 
not only the column totals (number of passes) but also a similar total for 
each of the N rows. Let T stand for the total in any column and X stand 
for the total in any row. This Y is a sort of "score" for the person—his 
number of passes (or yeses) on the C items. The sampling distribution of. 
the quantity 

= (C= DICET? — ry] 
97 Gx-ze йз) 
follows the x? distribution with C 
(N > 30, presumably). 

The computation of Q 
obtained Q exceeds the 
conclude that the (corre 
that is, they are not homogeneous. 


— 1 degrees of freedom for N large 


2 2 2 
Y= iE Bia 4] (13.6) 
i t B, A, or B, 
anings indicated in Table 13.9, 
T two groups classified according 
ary computations required by 
table. Note that, as usual, the 


ng across and down. Column D 


keted part of the formula. 
„ We have 22, which for a df of 4 yields a 
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Table 13.9. The calculation of x? from a 2 by k table: two groups 
and k (— 5) responses 


Col. A Col. B Col. C Col. D Col. E 
Group 
I II A+B ——— Li: 
A; + Bi A; + B; 
1 27(= 4) 15 (= Bj) 42 .3571 5.36 
2 26(= Ad) 16(= By) 42 .3810 6.10 
3 247 (= A3) 110 (= Вз) 357 .3081 33.89 
4 41(=A,) 8 (= By) 49 .1633 1.31 
5 39(= Aj) 15(= Bs) 54 2778 417 
50.83 
Totals 380(= 4) + 164(= В) = 544 (= №) .3015 49.44 
1.39 
544? 


4.75; 7% = (4.15.39) = 6.60 


(3800164) — 
n=4, P=.16 


P of about .16. In other words, once in six trials differences as large as 
those in Table 13.9 would occur by chance; hence we have insufficient 
evidence for concluding that the universes from which these two samples 
were drawn differ in regard to their responses to the asked question. 

If we had to depend on the D|Sp technique for testing the significance of 
the group differences in Table 13.9, five z ratios would result—for each 
category there is a possible difference in proportions or percentages with a 
standard error for each difference. The five zs might, and usually would, 
lead to five different P values with a consequent predicament as to inter- 
pretation. Off hand, it might be argued that, if any z so determined reached 
an acceptable level of significance, we would be justified in concluding that 
the difference between the groups was real rather than chance. That such 
an argument may be fallacious is well illustrated by the data of Table 13.9, 
which are actual data. When these data first came to the author's attention, 
the table was in percentage form with a z worked out only for the category 
showing the largest difference. This z, based on formula (5.6), was 2.54, 
which is near the Р = .01 level of significance, and it had accordingly been 
concluded that a real difference had been found. Now, when we consider 
the у? P of .16 for the over-all comparison, we are not justified in placing 
much confidence in such a conclusion. 

Why the apparent inconsistency between two tests of significance? 
Since most investigators are looking for group differences rather than 
group similarities, there is the tendency to single out a category for 
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comparison, not because of intrinsic a Priori interest in that category but 
because it happens to yield the largest difference. By this a posteriori 


over-all single index of significance, but also helps u 
clusions. | 

Application to k by / tables, Consider the data of Table 13.10, which 
contains a contingency-type table involving three £roups and three possible 


Table 13.10. Table of frequency of three possible responses for three groups of 
theses add downward to 100* 
Motivation of 


Be Group 
Conscientious 

Objectors I IL III Total 
Not cowards 24(27.0) 56(53.8) 71(69.6) 151 
Partly cowards 30(33.7) 23(22.1) 19(18.6) 72 
Cowards 35(39.3) 25(24.0) 12(11.8) 72 


Ns 89(100.0) 104(99.9) 102(100.0) 295 
* Data from Leo Crespi, Js Psychol., 1945, 19, р. 285. 
opinion responses. To test the si 
groups by use of the DJS,, tech 
centages for 8гсир I vs. II, I vs, II, 
responses—a total of nine zs, 
— 36.58, which for df — 4 is doubl 


P = .001 point. From Elderton’s 
hence Table 13.10 as a whole exhibj 


“not cowards” response. 
ve the “cowards” response. Now it 


» I. TL, and III, can be (and are) pla 
for amount of education: grammar school, 


ation shown in the 
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[13] 
represented by a contingency coefficient of .33, which may seem rather 
low in light of the highly significant 7? P. This illustrates a point which 
most readers will already have grasped: high statistical significance and a 
high degree of association are far from synonymous. Consideration of the 
data of Table 13.10 readily indicates the difficulty of predicting responses 


when the extent of association is represented by a C of .33. 
As in the 2 by k table, so here it is better to calculate an over-all 7? 


before examining by the z technique any of the possible separate compari- 
sons. Unless the 7? P is significant, it is unwise to proceed with such 
comparisons. 

The calculation of 7? for a by / table is greatly facilitated by the follow- 
ing procedure. Let the observed frequencies be represented by fs as in 
Table 13.11. Divide the square of each cell frequency by N,, the total № 
for its column; sum these quotients across, one sum for each row; divide 
each of these sums by the respective total row frequency, л„; add these 
quotients, deduct 1, and multiply by the grand total N. The result is 7?. 
The first set of quotients (klin number) should be carried to two decimals, 


and the second set of quotients should be carried to three decimals. The 
given in symbols in Table 13.11. The first 


and the second the column. X means sum 
n designations, 1, y 3; 


computational process is 
subscript to findicates the row 
with c taking on in turn the colum 


Table 13.11. Schema for calculating x? from а k by / table 


1 2 3 Total 
1 fn fa QUA 
f^ulMi Га! Na EIN. 
2 fa fs E 
f^al Ni f^i] Ns > Peed Ne 
3 fn fse fos ‚ 
ў al Ni Г?зә No fal Ns Uf sel Ne 
Total М, No № N 
Xf*J Nc E fd Ne Efl N. | 
2 = М| = А =j 
же ту ы Ny пз 


The use of у? in testing the goodness of fit of a 
stribution is illustrated in 
usually with more 


Goodness of fit. | 
theoretical curve to an observed frequency di 


Table 13.12. We start with an actual distribution, 
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grouping intervals than in our example, and the descriptive statistical 
measures therefor. In fitting the normal curve to the distribution of 
Table 13.12, we need N, M, and S. To set up for each interval the frequency 
which would hold for the best-fitting normal curve, we go through the 


Table 13.12. Goodness of fit of normal curve to Stanford-Binet 


IQs, form M 
Proportionate 
IQ [0] z[S Area E O-E (0 – ЕЕ 
E нии 
150 13/16 -0041 12 4 1.33 
2.645 
140 55 .0158 47 8 1.36 
2.057 
130 120 0512 152 —32 6.74 
1.468 
120 330 1186 352 -22 1.38 
.879 
110 610 .1958 582 28 1.35 
291 
100 719 2316 688 31 1.40 
—.298 
90 592 .1950 579 13 29 
—.886 
80 338 4177 350 —12 41 
—1.475 
70 130 .0506 150 —20 2.67 
—2.064 
60 4g .0155 46 2 09 
—2.652 
50 jJ .0040 12 0 .00 
40 4 
30 1 
2970 = N .9999 2970 0 1702= x 
M = 104.56 =й зы Р = 03 
S = 16.99 
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119.5, since IQs are rounded to the nearest integer. Then (109.5 
— 104.56)/16.99 = .2907 as the z[S for the lower limit, and (119.5 
— 104.56)/16.99 = .8793 as the z/S for the upper limit of the 110-119 
interval. Of course, .8793 is also the lower limit of the 120-129 interval. 
Now the difference, .8793 — .2907 — .5886, is the same as 10/16.99 or 
115, which is the interval width expressed in z|S units. Adding .5886 once 
to .2907 gives .879 (it is sufficient to retain three decimals); adding it twice 
gives 1.468; and so on. Then subtracting .5886 once from .2907 gives 
—.298; subtracting twice gives —.886; etc. 

When the boundary limits in terms of х]$ have been set up, the pro- 
portionate area for a given interval is found by using the table of normal 
curve areas. The two top intervals have been combined, and likewise the 
three bottom intervals, so as to have no expected frequencies less than 10. 
The proportionate areas, .0041 and .0040, represent the areas beyond 
given points, and the Es at top and bottom are the number of cases expected 
beyond these same points. Note that the sum of the proportions should be 
unity within limits of rounding errors, and that the sum of the expected 


frequencies should be the same as the sum of the observed frequencies. 


Perhaps it is unnecessary to point out that the expected frequencies form 
d for the given intervals) 


an exactly (within limits of rounding errors an 
normal distribution which will yield the same M and S as the observed 
distribution with which we started. 

Straightforward calculation gives a x? of 17.02. With df= 11 — 3 
(number of intervals minus the number of constants used in the fitting), 
Р = .03; i.e., only 3 times in 100 would as large a y? arise by chance, or 
only 3 times in 100 would we get a worse fit if the universe of IQs were 
distributed as a normal curve. This would lead us to question whether IQs, 
as measured by Form M of the 1937 Revision of the Stanford-Binet, are 
distributed in the normal curve fashion. The same data with intervals of 
size 5 give a oP of .003, and the degree of kurtosis (by moments) is thrice 
its standard error; therefore it can be ‘concluded that the observed 
distribution is not a chance departure from a normal distribution. 

Thus the 7? technique provides us with a test by means of which we can 
judge that the frequencies of a given distribution do not follow the fre- 
quencies of a theoretical curve closely enough to be regarded as chance 
departures therefrom. Note that a smaller value for y? for the example 
of Table 13.12 would not prove that the universe is normal even though 
the P were as large as .90 or .95. This would merely indicate that the given 
data were consistent with the normal distribution. As a matter of fact, 
so-called excellent fits leading to Ps of .99 or more are suspect. When 
P = .01, it is said that chance sampling would lead to a worse fit only once 
in 100 times; when P = .99, itis said that chance sampling would lead to a 
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better fit only once in 100 times. In other words, if P is between .05 and 
01, the hypothesis that the universe distribution is of the normal type (or 
whatever type was fitted) is questionable; if P is.01 or less, this hypothesis 
is rejected; if P is between .95 and .99, we may suspect the fit as being too 
good; if P is .99 or more, we should definitely look for an error in calcula- 
tion or for some type of restraint on the operation of chance. Too gooda 


fit is as open to question as too poor a fit. If P is between .05 and .95, the 
fit is said to be satisfactory. 


When the goodness of fit of fre 


quency curves is being tested, the df 
depends on the number of group 


ing intervals and on the number of 


fth Rer fe 
AXi RX + faXS + fx, H: HAX, = NM 
Sith fats + fiat ++. ul + 
Now, if all the 
the right of the 
numerically. The resu 
sign and then combi 


and fy, those parts to 
quations could be added 


beyond the f, term in each 


of the other two equati numerically, shifted to the 


right, and combined numerically with the const 


ant, NM for the second and 
NS? for the third equation. TT 
This procedure will lead to three simultaneous е uati i 
t and 
J as the unknowns. NANOS with fi, fa 
Л+Љ fA 


AM + fiX + f,X, = B (say) 
Ла? + Ја? + fat = С (ѕау) 


unknowns. i i 
ns. For our p ; this means that, as soon as the 
frequencies for all but 
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remaining frequencies are not "free to vary;" they are fixed because of the 
requirements that the frequencies or functions thereof must add to N, NM, 
and NS?. We accordingly lose 3 degrees of freedom, and therefore when 
we are testing the fit of a normal curve to a distribution with k intervals, 
the dfis k — 3. 

Although the Kolmogorov-Smirnov (K-S) test does not involve x°, we 
include it here since it also provides a test of goodness offit. The procedure 
is relatively simple. The k observed frequencies are converted to cumula- 
tive frequencies, which are divided by N to secure cumulative proportions. 
For the given M and S, the proportions per interval expected on the basis 
of the normal curve are calculated (e.g., the proportionate area column of 
Table 13.12), then cumulated. We thus have two sets of cumulative 
proportions. The k pairs of values are examined to find the largest pair 
difference, D; that is, the largest discrepancy between observed and ex- 
pected. (This D is, of course, in proportion, not percentage, units.) 

The sampling distribution of D is such that for N greater than 35, D 


must reach: 
1.14/V/ N. for significance.at the P — .10 level 


1.36/VN for significance at the P = .05 level 
1.63/V/N for significance at the P = .01 level 


The advantages of the K-S test over the z? test for goodness of fit are 
twofold: the K-S test is applicable for N smaller, and it is a more powerful 
test than the y? test. The latter advantage means that departure from nor- 
mal form is more apt to be detected by the K-S test. Stated differently, 
compared to the 7? test the K-S test is less apt to mislead us into accepting 
the hypothesis of normality of distribution. - ae 

Although the method of fitting set forth in Table 13.12 should aid in 
comprehending the meaning of goodness of fit—an observed frequency 
contrasted with an expected frequency obtained via the concept of area 
under a normal curve for the interval —there is a computationally shorter 
method for calculating the Е values. The «|S value of the midpoint of 
each interval is determined, and then the ordinate (or y value) for each 
x/S value is written down from Table A. The products, each y times iN/S, 
provide the series of Es for the intervals. For the K-S test, y times i/S will 
yield the expected proportions, which are then cumulated. 

The z? test can be used to test the difference between two observed 
distributions (this becomes a 2 by scene ү pe ш 

the same pur Я in 
the K-S test has been extended for ? test ur [a K-S test have ixi 


two observed distributions, both the 7? t | 
drawbacks: significance тау reflect difference in location parameters, 


or in variances, or in distribution shape, or in any combination of these. 
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In this chapter we have discussed the essential nature of pid and have 
pointed out typical applications. By now the student should appreciate the 
advantages of д? over percentage comparisons and have some insight into 
the use of 7? as a means of testing hypotheses. 


EXACT OR DIRECT PROBABILITIES 


The 7? Ps obtainable from Table D are approximations in that areas 
under a continuous curve are taken as estimates of values which form a 
point distribution. Even with Yates’ correction for continuity, the 
approximation is none too good when E values are less than 5. This raises 
the question as to the criterion for judging the closeness of such approxi- 
mations, and the answer is that for situations involving 1 degree of free- 
dom it is possible to specify exact probabilities. How? 

First, consider the problem of deciding on the basis of a specified 
number of successes whether a chap can distinguish between two cigarette 
brands. We learned in Chapter 5 that the exact P for the probability of as 
many correct indentifications can be obtained by the binomial distribution; 
hence we need not use the normal curve or the у? approximation. But such 
approximations not only are very convenient computationally for N (or n) 
large, but also are accurate enough. In checking a y? P against an exact P 


derived from the binomial, we must bear in mind the possibility of con- 
fusing one- and two-tailed tests; both methods should be alike in this 
regard. 


Second, consider the 5 test of the significance of change (or difference 


ance of association 
Oups. For this situ 


ation the binomial is not 
hen the frequencies a 


ге equal on one, or both, of 
n be obtained for such tables by a rather tedious 
procedure which we shall now describe. It can be shown that the prob- 
ability for a particular observed set of frequencies, 4, B, C, and D, for 
fixed margins is 


р = (1+ В!(С + ру!(А+ CCB 4 D)! 
N! A! BICI DI 


parable to the usual 
of frequencies devi 


To have a test com significance test, we would also need 
the Ps for all sets 


ating farther than the observed set 
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from the null values of no association. This can be made clearer by an 
example. Table 13.13 shows an observed set (part I) and sets showing 
higher association (parts II and III. Note that each part is derived from 
the preceding part by subtracting 1 from both A and D and adding 1 to 
both Band C. This process is continued until 4 or D or both become zero. 
Note that the marginal frequencies remain the same. 


Table 13.13. Series of fourfold table frequencies required for calculating P 
directly and exactly 


Application of the foregoing formula to each table in turn will yield 
the probability for each set of frequencies, and the sum of these Ps will be 
the probability of as great association (in the given direction) as that 
indicated by the starting (observed) set of frequencies. We have 


(1208911009... 9367 


me (20!)(3!)(9!)(6!)(2!) 
Ри (12!)(8)(11)0(9!) _ 0031 


= 0201021901007) ` 
е (120(80(11009!) _ opo 
(20)(1!)(11!)(8!)(0!) 


The sum of these separate probabilities gives P — .0399, or .04 (to two 
ecimals) as the probability of obtaining sets as extreme (in one direction) 
as the set observed in part I of Table 13.13. If the situation calls for a 
two-tail test, the mere doubling of the calculated one-tailed P will give the 
exact probability of as large a difference (or as great an association) 
irrespective of direction only when the marginal frequencies are identical; 
that is, in a setup such as Table 13.4, only when A+ B = B + D. 
Otherwise, exactly the same degree of association in the opposite direction 
cannot occur, and the doubling of the one-tailed P will only approximate 
the required two-tailed P. Consider again the left-hand fourfold table in 
Table 13.13. Association in the opposite direction would call for a majority 
of the cases in the upper-left and lower-right cells, but how many cases for 
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as great a negative association as the observed positive association? One 
criterion would be to say that for the negative association the frequencies 
A, B, C, D should be such that, with the margins unchanged, the value of 
(BC — AD) must equal (9 x 6 — 3 x 2) or 48 except for having the 
opposite sign. [Note: the value of (8C — AD) enters into both the 
fourfold point r and the contingency coefficient for the2 x 2 table.] If we 
try 


8 4 12 
1 7 8 
9 11 


we get (BC — AD) — —32, which would indicate a greater degree of 
association than the 48. If we try 7 for A, we would have (BC — AD) 
= (5 x 2—7 x 6) = —32, which leads to a lesser degree of association 
than the 48. It is simply impossible with nonsymmetrical marginal 
frequencies to have the same degree of negative as positive association. 


Table 13.14. Sets of frequencies for negative association as 
Sirong as the positive association in [ of Table 13.13 


The best that we can do to obtain a two-tailed P is to consider all 
possible sets of frequencies that give rise to negative association as great as 
the observed positive association. For this, only the two tables iven in 
Table 13.14 qualify. For I of Table 13.14 we have 5 


B 121! 8! 1119! 


antl dae os = 0224 
2018141117! 
and for II we have "me 
1218! 11! 9! 

Py = 8 II :0026 


ets 
20!9!3!0!8! 


oes not correspond to either of the -tailed P: 
doubled. P one-tailed Ps 


The argument regarding the effect of as 


mmetry on the reciprocity of 
опе- and two-tailed Ps also holds for Са Vane 


Ose situations involving fairly 
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sizable Ns when either or both margins of the fourfold table depart 
markedly from a 50-50 split. If the smallest E is less than, say, 10, the y? 
P (even with Yates’ correction) when halved does not lead to an entirely 
valid one-tailed P value. 

The computation of the separate Ps, laborious even with an ordinary 
table of logarithms, is greatly facilitated by a table of the logarithms of 
factorials, such as Table XLIX of Part I of Pearson's Tables for statisticians 
and biometricians. For Ns up to 28, special tables are available (see Table I 
of S. Siegel's Nonparametric statistics, New York: McGraw-Hill, 1956) 
for judging whether the exact probabilities reach certain commonly used 


levels of significance. 


Chapter 14 


INFERENCES ABOUT 
VARIABILITIES 


We now return to the problem of statistical inference based on measures 
for continuous variables. This chapter is concerned with inferences 
regarding variances and differences between Variances, and presents a basic 


theoretical distribution which will serve in later chapters when we again 
discuss tests based on means. 


interest in this chapter to present a 
standard error of the mean formula, 
"take on faith" a part of the next 
tion. Let X (the symbol which we 


assignment thereto is a chance matter. With r standing for the rth row 
and c for the cth column, we will have a table like Table 14.1. 


Summing across columns leads to a Xy (= NX) for each row. These 
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to columns depends on chance, there will be no correlation between 
columns when R is infinitely large; hence the sums for rows will have a 
variance given by the well-known variance theorem for a sum: 


xy = oye = 0% + Opto 0, te o 

We have used the symbol c instead of S because, when we regard R as 
infinitely large, we are dealing with theoretical rather than observed 
variances. Under the condition of infinite R, all columns will have the 


Table 14.1. Score layout for R replications 
(successive samples) of size N 


1 2 see. geese gy 
1 X X x X / ,EX-NX, 
2 — X юх 0X — X йәй; 
BOX X X X ZX-NE 
R x x x x АХ = NX 


same variance, o?,, and their sum will simply be No?,. That is, оуу 
= No?, But this is the variance of N times the row means; if we divide 
this variance by N? we have the variance of the means themselves. (If this 
last step is not immediately clear, recall that for any variable, Y, we have 


б, = ao,, Where а is a positive constant.) Thus 


- 2 2 
cy No Oz 


N? N? N 


the square root of which is c, | V. М, the familiar formula for the standard 
error of the mean. In practice, we need to estimate o,; by S, if the 
sample is large and by s, if small. No claim is made that this rather simple 
derivation would satisfy the step-by-step rigor required by contemporary 
mathematical statisticians. 

Estimation of variance. To show that 5? = Xz?/(N — 1) is an unbiased 
estimate of o?, we need to show that the average of a very large (infinite) 
number of s? values is o?. Such a mean value is called an expected value. 
If the expected value of a measure corresponds to the population value, 
the measure is said to be an unbiased estimate. 

Suppose s?, is the estimate based on the rth sample and that we let 
v, = X, — X, and z', = X, — и be deviation scores from the sample 
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mean, Х„ and the population mean 


; Ш, respectively. If we subtract ж, 
from z', we have 


#„—® = (X,— w)—(X,- ¥) = Я, р 
hence С Е 
z, =s, +(¥,— и) and a? = [s + (х, — шр 
Ха", = Уа, + OX, — u)? + xx, — 5х, 
Now =z, = 0 and X(X, — uy = MX, — uy, 


Then 


thus leading to 
Za", = Ee’, + N(Y, — шу 

or А 

Ха? = Уд — М(Х, — ш)? 

This permits us to wriie 
goo EM LOS NOX = mu 

" N-1 N-i N—1 
Suppose we have R re 


plications (samples) and that we average the R 
estimates: 
by Xa, y XX = п)? 
Xj ^ N-I N-1 
R R R 
2 
x Xa" 
= NT М X(X—yy 
R N—1 R 
М _ 22r? М У(Х, – п) 


; multiplied by the N/(N — 1) 
term yields N/(N — 1) times 


1 1 means about the population 
mean, or the sampling variance of means, which is g?/ N. Thus, this term 
becomes 
AS. а?, = N о? dus 1 о? 
N—1 N-1N N-| 
Hence, as R —> со, 
zs м 1 
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Factoring out o?/(N — 1), the mean or expected value of s?. becomes 


vai 2 
eas (N—1)2c 
and therefore s? is an unbiased estimate of the population variance. 

If the student follows through the foregoing development with N instead 
of М — 1 as the divisor, he will discover that S? = Xz?/N is a biased 
estimator. 

Variance and x’. The student who peruses the statistical literature 
may encounter the relationship, у? = NS?[o? = (N — 1)52/0° = YXa?[o?. 
Since 7? is a random value varying from sample to sample, this implies 
that the random sampling distribution of S? and of s? is related to the 
random sampling distribution of 7?. We will now attempt to build up the 
connection between the two. 

Consider the binomial situation with z elements (п coins, n dice, etc.), 
with p the probability of success on a single element. The mean number 
of successes, np, may be regarded as the expected frequency of success 
and the mean number of failures, ng, may be regarded as the expected 
frequency of failure. A trial toss (or roll) will lead to an observed number 
(frequency) of successes, O,, and an observed frequency of failures, O,. 
(If, e.g., 8 of 10 coins show heads, O, — 8 and 0;= 2; O,+ 0, = п) 
We have 

gio x (0 — E* _ (0. — np* (0, — па)? 
E np nq 


which may be rewritten as 
ife do np) , pO, — па)? 

npq npq 
But for this 7? with 1 degree of freedom the numerical (absolute) value 
of (O, — ng) = (О, — np); that is, the discrepancy is the same except for 
sign, which sign disappears in the squaring. We may, therefore, replace 
the (O, — пд)? of the second term by its exact equivalent, (O, — np)*, thus 
giving 


у= q(0, — пр)? + POs = пр)? _ (0, — np)(q + p) 
npq npq npq 
Since g + p = 1, we get 


„2 _ (Os — np? 
` пра 


The (O, — np) is the deviation of a random value from a (theoretical) 
mean value, hence may be regarded as an х; and npg = o? is nothing more 
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than tlie theoretical (population) variance of these zs. Accordingly, 
7? = z?|o?, a relationship which we previously had in another context for 
72 when the df is 1. | BE 

Next, suppose for the foregoing general binomial situation we have 
individuals А, B, C, --- , N, each making a single trial. For each we would 
have two О values and the corresponding E values. The у? for A’s outcome 


could, according to the preceding argument, be expressed (letting O 4 
represent his frequency of success) as 


Ха mde 
npq z 
and 
2 (05 — пр)? 226 
XB => 
npq z 


2 _ (Оу — np?  z*y 
mal 


parate 7? values leads to a total chi square, 2, 
the dfs for the 7?s being summed. Thus, 


A summing of these se 
with df = N, the sum of 


2 2 
foy past lm 
ind Li а: 78 
95 9. 


Since the sampling values of X^, follow the chi square distribution with № 
degrees of freedom, the values of Za?|o?. also follow the chi square 
distribution with N df. But this Ez? differs from the usual Ea? in that the 
xs here used are deviations from a theoretical mean instead of (as usual) 
from an observed mean. Sinc 


| € there are no restrictions on these zs, the 
df for this Za? is N. For the usual Sx? 


we have ж = X — Y, and the df is 
N — 1. With df 2 N — 1 for the usual Уа, it follows that for the usual 
situation the df associated with Za?|o?. is also N — 1; therefore we can 
say that У22/02, is a 7? variable with df=N — 1. Withz having its usual 
meaning (Le., x — Y — Y), we have the two variance estimates, S? 
= Xa*[N and s? = Xa?|(N — 1), from which we see that £r? = NS? 
= (N — 1)s?, and therefore 


Ха? NS N-D? , 

P, oF gw. oc ind 
with df= N — 1. That is, both NS?/c? and (N — D)s?/o? have sampling 
distributions that follow the chi Square distribution with N — 1 degrees 
of freedom. (Note that since о, S, and s pertain to the same variable, Y, 
we drop the subscript to ø.) 
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Inference based on a single variance. The relationship set forth in 
(14.1) permits us to use either S? or s? as the basis for testing a hypothesis 
about a population variance and also for establishing confidence limits for 
с. Although situations rarely arise in psychological research where logic 
leads to a hypothesis regarding о?, we will illustrate the procedure, with 
a deliberately cooked up example that will help us understand the more 
important task of setting a confidence interval for o°. 

Suppose an S? of 100 or an 5° of 105 based on N = 2] and the hypo- 
thesis that с = 16, or o? = 256. If the hypothesis is true, we have 


p = ZË 01100) _ (21 — 00105) _ g 94 
E 256 256 


with N — 1 = 20 degrees of freedom. When we turn to Table D we find 
that the P associated with a 7? of 8.21, df = 20, is .99, which is to be 
interpreted by saying that a 7? greater than 8.21 will occur .99 of the time 
by chance; hence, a value as small as 8.21 has a P of .01. That is, if 
оё, = 256 is true, then only .01 of the time would we get a sample S? as low 
as 100 or a sample s? as low as 105. Accordingly, we would reject the 
hypothesis at the .01 level of significance. 

Next, let us suppose the sample S? is 457.96 or s* is 480.86. This 
gives y? = (21)(457.96)/256 = (21 — 1)(480.86)/256 = 37.57, which falls 
at the P — .01 point. The probability of an S? as great as 457.96, or an s? 
as great as 480.86, as a deviation from o%, = 256 15.01; again, we would 
reject the hypothesis. 

Turning now to the setting of confidence limits, the student will recall 
that when ascertaining the .95 confidence interval for a population mean, 
the procedure was to find ¢ for the .05 significance level for the given df. 
Call this г о, then the limits are given by X + 1055, The to; is of course 
the z that cuts off .025 at each end of the г distribution. This suggests that 
when setting the .95 confidence interval for о? we would need to find the 7? 
value that cuts off the top .025 area and the 7? that cuts off the bottom 
.025, for the given df. Or if we had, for the df, the values for 3? that cut off 
the top and the bottom .01 area, we would have values to use in setting the 
.98 confidence limits. For the latter limits, our rule-of-thumb procedure 
will be to find, for the given df, 7799 as the 7? that cuts off the lower .01 
area and 72.9, as the 7? that cuts off the upper .01 area. 

Returning now to the example with S? = 100 (or s? = 105) and df = 20, 
we find from Table D that 7299 is 8.26 and 3? is 37.57. Next we ask 
what values of o? will yield these two °s. Using 8.26, we have 


29 (20(100) _ (21 — 1)(105) 2100 
= = 


а? с o? 


0 = 826 
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as an equation to solve for c?. Thus, c? = 2100/8.26 = 254.24. Using 
о we have 7? = 37.57 = 2100/0? from which we get o? = 55.89. Note 
that the lower 7299 point leads to the upper limit, whereas the higher 
7° оу leads to the lower limit, for o?. If we take the square roots of the 


two os, we get 7.47 and 15.94 as the limits for the .98 confidence interval 
for the population standard deviation. 


DIFFERENCES BETWEEN VARIATIONS 


Formulas (6.10) and (6.11) for the standard error of the difference 
between standard deviations, for N not small, have been given earlier in 
this book. When testing the difference between two standard deviations or 
two variances we must, as always, distinguish between situations involving 
correlated values and situations in which the measures are independent 
(or based on independent samples). The methods about to be presented 
are applicable for both small and large samples and are based on differences 
between variances rather than differences between standard deviations. 

Differences between correlated variances. Correlated variabilities arise 
when we have two forms of a psychological test administered to the same 
group with an S or s for each form, or when we have the S for a first trial 
vs. the S for a later trial for the same sample, or Ss for the performance of 
one group under different experimental conditions, or Ss based on two 
groups (N pairs of individuals) related by blood or related by matching. 
For such situations the difference between variations can be tested by 


ТЕЕ (5, — A/N- 2 
48182,01 — r2) 


or its exact equivalent with 5*, and 5°, replaced by 5°; апа S*,. This t 
follows the ż distribution with N 


4 — 2 degrees of freedom. 

| Differences between independent variances. For the purpose of test- 
ing the difference between uncorrelated Ss or 55, Professor В. A. Fisher 
developed the mathematics of the sampling distribution of a function 
designated by z and defined as 


(14.2) 


z = log, s, — log, s; (14.3) 
If successive samples are drawn from a single universe or from two 
universes having the same variance, the sampling variation of z will center 
at zero and depend on n, and ng, the two dfs. Note that the sampling 
distribution is independent of the universe value of the variance or 
standard deviation. In other Words, we do not require an estimate of a 
standard error which uses information from the samples, as required for 
the standard error of the difference between Ss. Probability tables for the z 


[14] INFERENCES ABOUT VARIABILITIES 247 


function are available by which we can, for given dfs, i.e., n; and л, find 
how large z must be for the .05, the .01, and the .001 levels of significance. 

The z, defined by formula (14.3), has one disadvantage: logarithms must 
be used. Since (14.3) can be written in the equivalent form 


1 
z= log, d (14.4) 


it is seen that, instead of the difference between two logarithms, we have z 
as a function of the ratio of the two estimated variances. From the 
sampling distribution of one-half the log of a ratio, the sampling distribu- 
tion of the ratio itself can be inferred. For n, = 5 and n, = 16, the value 
of z, which will be exceeded 1 per cent of the time by chance (the .01 
probability level), is .7450. This is one-half the log of the ratio of the 
two variances, and hence the log of the ratio would be 1.4900; by refer- 
ence to a table of natural logarithms the antilog of 1.4900 is found to be 
4.44. That is, as large a ratio as 4.44 would occur .01 time by chance. In 
order to avoid the necessity of using logs, Professor George W. Snedecor 
has developed tables for the variance ratio, which is defined as 


Е = `1 (14.5) 


The equation* of the sampling distribution of F contains two ns: 7 for 
the df upon which s, is based, and n; as the df for sy. This means that there 
is a sampling distribution curve of F for each possible combination of п, 
and m, The probability table for F must accordingly be entered with n, 
and n, in order to learn what level of significance a given F reaches. To use 
Table F of the Appendix, we take the larger of the two variance estimates 
as the numerator in computing F, and the df for this larger estimate is 
symbolized as п; regardless of any system of subscripts that may have been 
used to designate the two groups. Thus the F that is used with the table 
is always unity or greater, even though the sampling distribution of F 
involves values less than unity. That is, if we were drawing successive 
samples from groups A and В and each time took F as s*,/s?, regardless of 
which was the larger estimate, the sampling distribution of F would 
obviously involve values below unity as well as above unity. The table, 
however, is set up in terms of the greater-than-unity side of the sampling 
distribution. 


mn don. 
r|——— |n": 
D F712 


e Ы s (:) (жЕ + nor 
2 2 
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If we wish to judge whether two samples, either large or small, yield a 
difference in variability which is large enough to warrant concluding that 
the two population variabilities differ, we set up the null hypothesis that no 
difference exists in the two population variances. Then, instead of dealing 
as usual with the difference between the two estimates, we take their ratio. 
Obviously, the departure of this ratio, F, from unity reflects or depends on 
the difference between the two variance estimates. If the value of F, com- 
puted with the larger estimate in the numerator, is so large that it is not 
reasonable to believe it a chance deviation from a true value of unity, the 
null hypothesis is rejected, and it is concluded that the two populations 
do not have the same variance. If F is small, i.e., near unity, the null 
hypothesis is accepted. 


Now it happens that, although the 
the .01, and the .001 levels of Si. 


Broups. For this particular case, 
signifies that as large a difference i 
the time by chance. This is so bec i 


on. If we had this last probability, we 
T one direction only; conversely, if we 
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respective dfs would be 7 and 8. In computing F we have 147.62/50.21 
= 2.94, and n, becomes 8, with n, = 7. Turning to Table F, we see that F 
would need to be 3.73 for the .05 level, which for this type of problem is 
the .10 level. Therefore the null hypothesis is not rejected. If we take the 
square roots of the two variance estimates, we get ss of 7.09 and 12.15. By 
the F test, we are in effect saying that the difference between these two ss 
is not significant. As usual, this does not prove the null hypothesis—it 
becomes acceptable because we cannot with sufficient certainty reject it. 

If the research hypothesis being tested or the decision to be made calls 
for a one-tailed test, the F values in Table F are applicable without further 
ado. As a matter of fact, if the null hypothesis is to be accepted unless 
s?, is significantly larger than s?,, we would not bother to compute F if s?, 
turned out to be smaller than s?. 

Differences between several independent variances. We have seen in 
Chapter 13 that y? can be used to provide an over-all test of the difference 
between several independent proportions (p. 230) for C groups and also 
between C correlated proportions (p. 227). In the next chapter we shall 
see how an over-all test can be made for the differences between several 
means, either correlated or independent. We shall consider now an over- 
all test of the difference between three or more variance estimates. This 
test is not applicable when the variances are correlated (based on the 
same group or matched groups). 

Suppose we have k variance estimates, 5°, 5°, - - - , 52, +++, s?,, based on 
m, — l, ma — l, +++, m; 1,777, m, — 1 degrees of freedom respectively. 
Let N be the sum of the ms. Compute the products: each s? times its df. 
Sum these k products (the equivalent of summing the k sums of squares of 
deviations). Let s?,, stand for this sum divided by N — k. Determine the 
log of each of the k s? values, then calculate the products: each log s? times 
the df for the given s*. Sum these products, that is, X (m; — 1) log 5°, in 

t 


which i takes on values from 1 to k. Determine the log of s?,, and compute 


1 (z 1 1 ) 
G=1 XE = 
+1) i m—1 N-—k 


Finally, calculate the quantity 


у= [WW — hogt, -E (m— ов] (146) 


The sampling distribution of V follows the у? distribution with k — 1 
degrees of freedom. If V reaches the P = .05 or P = .01 or any a priori 
chosen level of significance, the differences between the k variances may be 
regarded as nonchance, hence the conclusion that the k groups have not 
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been drawn from populations having equal variances. If V is not signifi- 
cant, we accept the hypothesis that the groups have been drawn from 
populations having equal variances. The variances are said to be homo- 
geneous. The procedure just described is known as Bartlett's test for the 


homogeneity of variances. It is appropriate for testing the assumption of 
homoscedasticity in bivariate correlation scattergrams. 


F, у?, t, AND z (NORMAL DEVIATE) 


Since F involves the ratio of two variance estimates and since there is a 
connection between a variance and 7?, it is possible to write F as a function 
of two ys. Recall that (N — 1)s?/o? = у? with df of N — 1. Solving for 
s? we have s? = o*;?/(N — 1), Thus 55, = o*y*|[n, and s?, = о?у, |пь in 
which n, and л, are the dfs. Then 

Fe 3°, о?у; 


(14.7) 
5, o^ [ns 
Under the null hypothesis condition that а?у = 0°, we see that 
2 
=й (14.8) 
m 
Holt 


ach divided by its df. This 
he F ratio and also serves as 
stribution of F under the null 
of F for levels of significance 
» ly say that the sample value 
of F is not consistent with the null аа, d that o% nde ёзу. 

If the estimate s%, is based on an infinitely large df (i.e., п, = infinity) 
it will equal o? des 


E 3 > 
3 ‘2 OF if т » happens to be a known theoretical variance (e.g. 
the variance of a binomial), we can write F as 


у a x? equivalent because it 
» hence can not Possibly vary as a у? variable. Again, under 
the null condition that 02, = 0%, the two variances cancel, leaving F 
= ln, when "ә = œ. This means that in the оо lines of Table F, each 
F isa A divided by n,. If you had a 7% and no # table available you could 
divide it by its df and enter the quotient in the co lines of Table F to learn 
Whether it reached one of three levels of significance. 
If n = co and n, = 1, we see that F= 2? But a x? with 1 df equals 
an 2?/o?, or a z? where z is a unit normal deviate; hence F = 22 when 
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n, = со and n, = 1. The first column entries of the oo lines of Table F 
are the squares of z/c values. 

If n, = 1 and n, varies, we have 
Fe el 

rz 

in which we deliberately did not replace s?, by its 7? equivalent. The 
numerator involves а у? with 1 df, hence у, = a?,/o?, in which x, is a 
normal deviate with variance о?,. Substituting for 7*,, we have 


‚ ҖЕ | 2 2 
p.ex. Xx (14.10) 
S 


2 
Sg 2 


If oł = 02, = o?, we may regard 5°, as an estimate of the common vari- 
ance, o?. Now since the x, values are random normal deviates from the 
assumed common population, we may drop the subscripts and have 
F = a?[s? with n, = | and nj = n where n is the df for s?. The square root 
of F becomes 2/s, or the ratio obtained by dividing a normally distributed 
variate by an unbiased estimate of its standard deviation. Since this 
corresponds to one definition of г, we have F = t? when n = | and n = n. 
All the entries in the first column of Table F, with the exception of co lines, 
are 1? values. The exceptions are 2°, or (v/o)?, values of the unit normal 
distribution. 


Chapter 15 
ANALYSIS OF VARIANCE: 
SIMPLE 


Chapter 14 is applicable in a wide 
Tequirement is that we have two 


* See the study by Norton, report 


ed in Lindquist, E. F., Design and analysis of 
experiments, New York: Houghton 


Mifflin, 1953, 
252 
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It will be recalled that under certain circumstances the squared correla- 
tion coefficient is interpretable in terms of the proportion of variance 
"explained." The idea is that variation can be broken down into com- 
ponent parts in such a way as to permit specification of the relative 
importance of the component sources. Back of this is the fact that vari- 
ances are additive to a total variance, as shown when we derived formula 
(9.10), which is basic to the so-called variance theorem. Although this 
theorem is fundamental to the analysis of variance technique, it is not our 
aim to consider methods of estimating the proportion or percentage of 
variance due to a given source but rather to discuss ways of testing whether 
a possible source is contributing to the total variance to a statistically 
significant degree. 


BREAKDOWN OF SUM OF SQUARES 


Let us begin with the simple situation in which the total variation for a 
set of scores based on N individuals is possibly due in part to the fact that 
the total group is heterogeneous with respect to some factor, such as 
socioeconomic level or age or racial origin or type of treatment or method 
used in memorizing or varying level of illumination—any factor which 
permits breaking down the total group into subgroups. In other words, the 
individuals or their scores can be classified into subgroups, or the total 
group can be regarded as made up of specified subgroups. For simplicity, 
let us assume that the subgroups are of the same size, say m cases per 
group, and that we have G groups. Let g stand for any subgroup; i.e., 
g takes on values of 1, 2, 3, : + + , G, and let the mean score for the groups be 
specified as Ху, Xo, X, 7c, Хо with X as the mean for all groups 
combined (total mean). Although it is possible to use a precise notation, 
such as X;,, to denote the score of any, the ith, person in group g, we shall 
in this chapter simply use X as the score for any individual. 

We are now in a position to write an individual's score as a deviation 
from the total mean in terms of the deviation of his score from his group 
mean and the deviation of the group mean from the total mean. Thus, 
for a score in group g, 


(0) = (0 5) + (0, X) (15.1) 


which indicates two sources of variation: the variation of a group mean 
from the total mean and the variation of an individual's score from his 
group mean. 

If we rewrite formula (15.1) specifically for group 1, we have 


(x— X) 2 (€ — X) - (5, — X) 


254 PSYCHOLOGICAL STATISTICS 


Squaring both sides gives 
Qt — Xy = (X — XY + 05 — XP+ 2075 — HK Xj 


as the squared deviation, from the total mean, of any score in group 1. 
Each of the m persons in the group will have such a squared deviation 
score. We may indicate the sum of the Squares for the m cases as 


U(X — XP = YX — Хуу + U(X, — Xy + xXx, — X)E(XY — Xj) 


Note that in the last term the constants 2 and (X, 
from under the summation sign, and that У(Х — 
deviations of a set of scores about their own mea 
Therefore, the last term vanishes. 
term involves summing a constant, 
the number of cases involved in th 


— X) have been taken 
X), being the sum of 
n, will be exactly zero. 
Note also that the second right-hand 
which is the same as multiplying it by 
€ summation, i.e., 
X(X,— 8) m(X, — Xy. 
Thus we see that we ma 


y write the sum of Squares (of deviations) for 
the first group and by anal 


ogy for the other groups as follows: 
Ist group: X(x— Xy = У(Х — х) + m(X, — xy 
2nd group: E(X — XP = DY — Xy + m(X,— xy 
gth group: E(x — yg = UX — x. m(X, — xp 
Gth group: X(x — yj = U(X — XY + m(X,— xy 
If we summed the left- we would obviously 
et of N — mG cases. 
сап be conveniently 
- We may sum 


first sum for each Broup, then sum 


£n indicates that the subscript g 


i om I to G. The sum of the other right-hand 
terms can be written as m y; (X, — xy. 
g 


У(Х — yp- ZE(X-— Xy + maxX, — xy (15.2) 
g 

as a means of expressing the fact that the total sum of squares (of devia- 

tions) can be broken down into two components, the first of which has to 

do with variation about group means, i.e., within groups, and the second 


[15] ANALYSIS OF VARIANCE: SIMPLE 255 


of which involves variation of group means about the total mean, i.e., 
between groups. In other words, the total sum of squares is made up of 
two additive parts. If we divide both sides by N or mG, we have the total 
variance broken into additive components, but for our present purposes 
we shall need unbiased estimates of variance, and hence it becomes 
necessary to divide through by degrees of freedom. 

The correct df can be ascertained by examining the three sums of squares. 
For the total sum of squares we have one restriction, the total mean, and as 
seen in Chapter 7 the df will be N — 1 or mG — 1. The within-groups sum 
is based on N or mG squares, but since these are about G different means 
there are G restrictions, or mG — G (= N — G) degrees of freedom. The 
last or between-groups sum involves G means, varying more or less about 
the total mean; thus, aside from the m factor, it contains G squares with 
one restriction, and the df becomes G — 1. In other words, the G means 
are analogous to varying scores, and obviously the mean of these means 
will equal the total mean. 

We may indicate the division of the three sums of squares by the proper 
dfs as follows: 


xxx — Xy Exo — Xy тУ(Х, — Xy 
mG—1 ’ mG—G ' G—1 
Notice that we are no longer dealing with an equation. Why? Each 
division will result in a variance estimate, but these are not directly 
additive, which means that we cannot specify what proportion of the 
estimated total variance is due to the between-groups variation. The 
reader should note, however, that the dfs are additive: 


(mG — 1) = (mG — G) + (G — 1) 


Before examining the meaning of these three variance estimates, let us 
label them: s? for the estimate of total variance, s?,, for that based on the 
within-groups sum of squares, s?, for that based upon between groups. 
Variance estimates are sometimes referred to as **mean squares." 


TUX — Xy 
It is of interest to note that 5°, = #——————— can be written as 
mG —G 
У(Х — Xy 
s 1—1 
7 G 


which indicates explicitly that s?,, may be regarded as the average of G 
estimates of the within-groups variance. 
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MEANING OF VARIANCE ESTIMATES 


Insofar as we think of the total mG cases as a sample drawn from one 
population, s? will be the best unbiased estimate of the variance of the 
population, o. If we think of the m cases for each of our G groups as 
samples from G possibly different populations, then s?,, will be a composite 
estimate of the several population variances, a sort of average which makes 
sense if the population variances are equal; if the G groups have been 
drawn from just one population, this within-groups variance estimate or 
5*,, Will differ little from, but be somewhat smaller than, 52, Note that s? 
and s?, cannot be regarded as independent estimates because the two 
estimates are based on practically the same deviations: extreme scores, in 
either direction, will tend to make both s? and 5°, large. If m, or the 
number of cases per Broup, is taken larger and larger and if the groups are 


regarded as belonging to the same population or populations differing in 
Some respects but having the same mean and vi 
or variate, s and 5°, will 


Let us next look at Sy 


In making this 
um of squares by degrees of freedom; hence 


In order to understand the meaning of i 
a sample of sample means from an indefi 
sample means for groups drawn from the s 
for this universe of Sample means is given 


y 2 3 Я 

formula, i.e., с a, = 0|т. If we Were given the value of о? and told to 
Е ji erse trait variance or а, we would simply solve 

ий 2 E E 

Oz, = a*|m for o?. Thus, o? = mo*.. If we had only an estimate of o?, , 

such as ton we could use this estim 


І ; à ate as a basis for estimating the trait 
variance; i.e., ms’ 2, Сап be taken as an estimate of o2, 


2 2 : . D 
we have s?, and s?, (see Previous paragraph) as estimates of the same 
population variance, 


These estimates should 


we may regard our G means as 
nitely large Supply of possible 


ibution. When an obtained Б, or s*,/s*,, 
asis of chance sampling, the implication 
d on the basis of chance sampling, hence 
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that there are real differences among the G means. If the null hypothesis of 
no difference is true, we expect that in the long run 5°, will tend to equal 0°; 
that is, the average or expected value of 5°, is o°. Suppose that the null 
hypothesis is not true and we ask about the mean or expected value of s?,. 
An expected value is defined as the mean of an infinite number of obtained 
values. 
Now 
2 mXX,— Xy 
з= E 
G—1 


in which X, is a sample mean from the gth group. It seems reasonable to 
say that the variation among X, Xs, ---, X,,°**,Xg will have two 
sources under nonnull conditions: the extent of the variation among 
the corresponding G population means, ду, д, *** Mj, ^ * ^, Mg, Plus a 
random sampling component because each X, is based on a sample of 
m cases. If we symbolized the sampling error by Е,, we would have 
E, = X, — H, which leads to Y, = и, + E,. (This is similar to the 
conception of a score X as being composed of a true part X, and an error 
E so that X = X, + E.) 

We seek an expression for the deviation (X, — X) which will incorporate 
the two sources of variation. The random error part can be expressed as 
(X, — m,) and the possible differences among the population means as 
(д, — ш) in which y is the mean of the population means. Thus we could 
write 

(1, Х) = (и, – ю +0, д) – (0 и) (15.3) 
in which the third term on the right is a (nuisance) variable which must 
be incorporated to make the equation balance, i.e., to provide an identity. 
To obtain an expression for 52, (aside from the m factor) we could square 
and sum either side of the foregoing identity. If we did this we would 
discover that the (X — и) term is really a nuisance. There is a trick that 
will help, but before considering the squaring and summing further, let us 
Set up a scheme which will eventually aid in ascertaining the mean 
(expected) value of 5°,. Let us imagine that we have R replications or sets 
of data, each set leading to G means based on m cases. We might arrange 
the obtained means in a table (Table 15.1) consisting of A rows and G 
columns, each mean having two subscripts, the first of which designates 
the row, the second the column. Thus Хз; would be the mean in the third 
TOW (set or replication) and the sixth column (group). Note the notation 
for the mean of means (M of Ms) along the bottom and right margin. 
The M of Ms on the right are obtained by summing across columns 
(groups)—the dot replaces the second subscript. For the M of Ms at the 
bottom the summing is over rows (replications), hence the dot replaces the 
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Table 15.1. Means for G groups, with R replications 


1 2 g G MofMs 
1 Xi Xiz Xis Хе А. 
2 Xa Xoz 29 Xsg Аз. 
r Xn X то X Xr 
R Xm Xe Xr Хр Xn. 
MofMs X, Xo His Xo Х.. 
Pop. Ms Ha Hg H.g Lg His 


first subscript. Averaging the M of Ms alon 
bottom leads to the X.. at the lower- 
hold for the G populations. 

To rewrite (15.3) for our first set 
subscript 1 to designate the first row, a 
of the subscript notation. Thus, 


(Xu 7 Xx) = (4, — 
or more generally, we have 


(X,, Еч X.) = (u., 


g the right or those at the 
right corner. The us at the bottom 


(or row) we need only insert the 
long with appropriate dots as part 


He) + (Xio — boy) — (Xe и) (15.4) 


= Be) + (Xpo = fog) — (X, — p.) (15.5) 


turns out to be unmana 


: = ult term by shifting 
the nuisance component, (Xj. 


— H..), to the left, rewriting (15.4) as 
(X5, — X)- (X. и) = (eg — n.) + (X, — и.) 


Squaring both sides and summing over g, we have 


X, x X) 
= (и, — ay + ZO, — uy --2X(u, — BJ, — 5) 


Note that since the second term on 
constant G times, the summation si 
in the next term, a constant is taken 


EQ, = X GU, — и) AX. — n.) E( 


the left side involves summing a 
£n was replaced by G; note also that 
from under the summation sign. Note 
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further that in this same third term, the expression E (X,, — А1.) is zero, 
hence this cross-product term vanishes. = 

Omitting the vanishing term and shifting the second term on the left to 
the right-hand side, we may write the equation for the first set, and by 
analogy for any set (the rth) and then the Rth (for a total of R replications): 


E (Fu — Xy = Ee — н) + E (Rag — Mea)” 
— GX, — n. Y 2E (Peg — и) — н.) 
EQ, — HY + EOS, ма)? 
— G(X, — i.) + 200и, — i XX, — и.) 
У(Х — Xp) = У(и, — H- P+ EO — "mu 
— G(Xg. — ny 2E, — XXn; — н) 


The addition of these equations will lead to a new equation for which 
we will need double summation signs, summing over g from 1 to G and 
over r from 1 to R. Thus, 

XX(X, Же Xy =R E (Hg ET uy +22 бея x Ha) 

rg A 9 B c 


rg 


(x, — XY 


aM 


= GX(X,. — u.) --2XZX(u.,— i XX — и.) 
T D rg E 
To facilitate further consideration, the five terms have been designated 
by letters. It will be recalled that when we divide m У (X, — Xy by its df. 
g 
G — 1, we get 5°, which may be thought of as ms*,. When we divide the 
exact equivalent, m E (X,, — X,.)*, by G — 1 we get an 5°, for the rth set, 
0 
and this 52, might be written as ms*, , But note that in the foregoing R 
equations we did not (and need not) have m, hence the division of 
>(Х„— X,.)? by G — 1 gives 5°. 

Our plan of attack is first to note that when each of the R X parts of 
term A is divided by G — 1, we have a variance estimate, Sb for each of 
the R replications. If we next sum these (varying) estimates and divide by 
R we will have the mean of the R estimates. Then if we think of R as 
approaching infinity or as infinitely large, we will have the mean (or 
expected) value of the estimate. This process of first dividing E (X,, — X,.)? 


by G — 1, then summing over r, followed by a division by R can be stated 
as follows: = = 
v х0, – X 
g 
ZZ(X,- Ay Gol Esa. 


= > ———_ = 
R(G — 1) т R R 
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By definition, as R becomes in‘initely large we have the expected value of 
the variance estimate, 52, But we wish to express this in terms of 
two parts—one which reflects a random error in the observed means 
whereas the other reflects possible real differences among the G means, 
L-g- To do this we will need to work with the right-hand side of the above 
five-term equation in order to see what happens when the four right-hand 
terms are also divided by R(G — 1) and R is allowed to approach infinity. 
Obviously, such a division will maintain the equation. When dividing, 


it is necessary that we break up R(G — 1) so as to evaluate a given term as 
R becomes infinitely large. 


For the B term, we have 


КУ(и,-и) Ely — p. 
R(G—1 сі] 


which is a fixed quantity regardless of R, 
For the C term, we may write 


У(Х, и.) _ : EE, =)" к 
КС-1) 4; G-1 


This involves for any one of the G, say the gth group, 
ations of a series of sample means, each based on m 
population mean for the group. For R infinitely large, 
Squared deviations divided by R is the true (or theore 
variance of sample means, hence C becomes 


Вена Tm o a 
G—1 


Gu g-i m 


the squared devi- 
cases, about the 
the sum of these 
tical population) 


in which the sample variance of the 
has been replaced by the familiar fo 
means in terms of population score 
It may help the student to note that 
of sample means in any column, the 


the foregoing procedure involves the assumption of homogeneity of 


à gm ыа z 

variances, When 9 = o’, = о% = а, the summing over g of the G 
variances, all equal, is nothing more than G times o?, the common popu- 
lation variance. 


For the D term, we have (ignoring its negative sign which is to be picked 
up later) 
G(X, — и.) с X.—,y 


iita cM any 
R(G — 1) GET R 


means from the gth group, o? 
rmula for the sampling variance of 
variance, о? and sample size, m. 
we are dealing with the distribution 
gth, in Table 15.1. The last step in 
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which, as R becomes infinitely large, involves the sampling variance 
of the means of means along the right-hand margin of Table 15.1, which 
variance we will symbolize as оё. Since each of these R means is 
based on mG cases, we could easily jump to the mistaken conclusion that 
this variance of sample means is a function of the variance of the mG 
scores about x.. and would therefore depend on the variance within the G 
groups plus possible variation among the G groups. Note that although 
it is presumed that the т cases of group g have been randomly drawn 
from population g, it does not follow that the mG cases have been drawn 
randomly from a grand total population made up by combining the G 
subpopulations. Instead, the sampling process ensures that each group is 
equally represented by m cases, which would not necessarily be true if mG 
cases were randomly drawn from the grand total population without the 
provision of equal representation. (This involves the concept of stratified 
sampling, to be discussed briefly in Chapter 20.) 
To evaluate the variance of the Х,. about џ.., let us note that 


X, = (Za + Kato Xu + + 016 


which indicates that Y,. as a random variable is made up by adding G 
variables of the form Kop the means in the rth row of Table 15.1. With 
R infinitely large the Xj, X, Xi Хе will be independent random 
variables (zero correlation between all possible pairs of columns in 
Table 15.1), hence the variance of the sum in the preceding parentheses 
will be given by the sum of the variances for the random variables being 
summed, and to get the variance of the averages, X,. we need only divide 


by G?. Thus 
о?) = (о? + а Tec [22 Tex g IG? 


but each of the G variances is the sampling variance for the means for a 
particular column in Table 15.1, which for infinite R will be nothing more 
than o2/m, with c? being the (assumed) common variance for the G 
populations. When we sum G of these, we will have Go*/m, so that 


Go?|m c? 


G? тб 


2 22 
Ст) = 


Hence the value for the D term becomes 

G č 2 G c а? 
€— Sa xw 
G—I G—1mG  m(G— 1 


As for the last, or E, term of our five terms, i.e., 


222 (uy — и), — н.) 
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let us note that the contribution of group 1 to this double sum would be 
2X(u,— “ХХ — ua) 
5 
which, since (u., — w..) is a constant, may be written as 
2(u., — и.) x (X, — Ha) 
Now with R infinitely large, the mean of the X, will equal y.,, hence 
we have the sum of a set of deviations about their own mean. It will be 
recalled that such a sum is always zero, a fact which will likewise hold 
for any and all values of g; therefore, the E term, when divided by infinite 
R vanishes—we need not divide by G — 1 of the R(G — 1) divisor. 
We now bring together the results of dividing the five terms by R(G — 1), 
with R becoming infinitely large. 
Eso Xn, 55 и.) " G a Е a А 
m G-1 G—im m(G — 1) 
Multiplying both sides by m gives 


0 


m Z in " m (д. = uy + Go? e 
R G—1 0-1 G-1 
We place the m of the left-hand side under the summ 
that the last two terms combined become о, 


ation sign, we note 
and we rearrange, giving 


Ems m X (u, — y)? 
£ ge p 07. 0709 
R G—1 


Thus the mean of an infinite number of ms*, (ог of the exact equiva- 


lent, 5%) values is given on the right-hand side of the last equation. 
That is, the right side gives the mean or expected value of 5°,. Stated 
differently, we may say that s?, as an estimate includes the two components 
on the right. If we let — stand for “is an estimate of," then 


m&(u., — y y 
2 
cu =, ==, 
G—1 


3, > g? + 
апа а]зо 
5° 
which helps us see that F — 5°,/5®„ is a test for the presence of real differ- 
ences among the џ.,, or the G population means, 
Next, we distinguish between two differing situations as regards the 
term involving the x., values. If the G groups represent, say, G schools 


drawn at random from a possible population of schools, we have the 
so-called random model, whereas if the G Broups are the two sex groups or 


wo 
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five defined socioeconomic groups or groups working under two or more 
experimental conditions, etc., we have the fixed effects model. Note that 
for the latter model, the number of groups is typically small and “fixed.” 
Even though we were defining G experimental conditions as, for example, 
G differing degrees of illumination, we would not draw the G levels at 
random from the theoretically possible large number of levels. Instead, 
we would deliberately select G levels so they would be spaced along the 
illumination continuum. If we were interested in the effect of sense 
modality on reaction time, the number of possible sense modalities which 
we could use is fixed. The fixed effects model is sometimes called the 
fixed constants model because the G values of x., are constants—with 
no sampling of groups, exactly the same population means are involved 
for each replication. This would not be the case when replication of the 
experiment involved, for example, drawing another sample of schools. 

For this chapter we need not worry further about the models except 
to note that for the random model it makes sense to replace the term 
containing the random и. by mo*,,, because the sum of squares is being 
divided by (G — 1) degrees of freedom and hence is an unbiased estimate 
of the variance of the population means based on a sample of size G. The 
use of such a symbol when only a few (2 or 3, or more) definitely fixed и. 
are involved does not provide a very meaningful description of the varia- 
tion among them. 

The student who got lost in the foregoing rather tedious, though 
rigorous, method for determining the expected value of 5%, might like a 
more intuitive and more easily understandable approach, whichis restricted 
to the random model but has similar implication for the fixed effects model. 
Recall that when considering the reliability of measurement problem we 
regarded the variable score, X, as made up of two variable parts, X, and E 
so that Y = X, + E, whence we had S?, = 5°, + 5%. Byanalogy we may 
say that Y, as a random variable is made up of two parts, и, and sampling 
error, E, (= X, — H). Thus X, = n, + E, and if we had a sample mean 
X, from each of the possible populations, the variance of the distribution 
of these means would be 

^a, = Ou dT 
in which the last term is the sampling error component (each mean is based 
on a sample of m cases). Multiplying both sides by m gives 


mo?;, = o* + mo"), 


Therefore, since m times the variance of the means, when all populations 
are represented, can be broken into two components, it follows that the 
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estimate, ms?, (= s®,), may also be subject to the same two sources of 
Я 
variation. 


In practice we do not have a priori knowledge as to whether the second 
component, expressed either as m Y (u., — H-.)(G — 1) oras mo?, 
g 


is not zero. What we have are two estimates of variance, s?„ and s?, 
(= ms®,,). If s*, is significantly larger than Sys Le. if F = 59/5? is beyond 
the chosen level of significance, it can be argued that s?, does involve a 
source of variation over and above that of random sampling errors in the 
Observed means; hence the second compon 


ent is real. 

Although the table of F requires that the larger of the two estimates be 
used as the numerator in computing the variance ratio, it should be noted 
that s? „ cannot be significantly larger than s*, unless the operation ofchance 
sampling has been restricted in some manner. In practical applications 
we are primarily and nearly always interested in the case in which s?, is the 
larger of the two estimates. If it is smaller than Sa it is ordinarily not 
necessary to compute F. 


We may now summarize the foregoing. When we have Scores on G 


groups of т cases each, the total sum of Squares can be broken down into 
two additive parts, that for between and th 


, is or 
" 


5°, must be regarded as 


Gat (fixed model) 


2 2 2 
Spo" + mor,,, (random model) 
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distributed population of scores for the trait or variable as measured and 
(2) that the G populations have the same variance. For large samples the 
first assumption can be checked by way of measures of skewness and 
kurtosis relative to their standard errors or by the chi square test of good- 
ness of fit. Unfortunately neither of these checks is very sensitive for small 
samples. The second assumption may be evaluated, regardless of sample 
sizes, by Bartlett's test for the homogeneity of variances. The reader will 
have noted that these two assumptions have to do with the distribution of 
scores within groups, which lead to the denominator s?,, of F. 

Computational formulas. The required arithmetical labor can be 
shortened by resort to the general principle for computing the sum of 
squares of deviations inherent in formula (3.6): 


+ ve TENEO 1 ° 2 
E(X — Xy = Xx? — SS = —[NEX* – (УХ)? 
( ) N N [ (EXY] 
Thus we would have 
xxx — Y= 3 [NEEX?2— (ZEX}'] (15.6) 


for total sum of squares, in which the double summation indicates that the 
summing is over all groups. It can be shown by easy algebra that 


TUX — xy- + [т®®х?  X(xxy] (15.7) 


g 


for within sum of squares and that 
MI = р Падек) – ху] (15.8) 
g mG 


for between sum of squares. 

Accordingly, to compute the three sums of squares of deviations, we 
need to sum all the raw scores, 2X; sum the squares of all the raw scores, 
УУ Х?; and sum the squares of the separate group sums, X(XX)*. These 
sums can readily be obtained on a calculating machine by computing EX 
and XX? for each group separately, squaring each XX, and then summing 
the several XX values for XXX, the £X? values for XX X?, and the (=X)? 
values for X(3 X}. 


EXAMPLE: TESTING THE SIGNIFICANCE OF DIFFER- 
ENCES BETWEEN SEVERAL MEANS 


To illustrate the application of the technique outlined previously we 
shall use unpublished data of Wright} on massed vs. distributed practice 


t Wright, Suzanne T., Spacing of practice in verbal learning and the maturation hy- 
pothesis, Unpublished Master's Thesis, Stanford University, California, 1946. 
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in the learning of nonsense syllables by the anticipation method. The 
essential comparison is based on the amount of learning shown in 34 
minutes by five (= G) groups of sixteen (= m) cases each. The groups 
differed in length of rest intervals between trials and/or in the total number 
of trials, as indicated at the top of Table 15.2. The scores of all 80 subjects 


Table 15.2. Number of syllables correctly anticipated at the 34th minute of practice 


Group 1 2 3 4 5 
Rest interval (minutes) 8 3:5 2 1.25 0 
Number of trials 5 8 11 14 29 
5 8 9 11 17 
5 7 3 12 16 
1 4 9 15 18 
5 4 10 11 11 
8 1 5 10 15 
1 7 11 8 9 
2 5 9 13 18 
2 6 6 13 13 
2 8 7 5 12 
8 14 6 7 15 
4 8 16 11 8 
1 5 12 12 13 
3 1 1 12 7 
4 5 15 9 15 
4 8 13 16 15 
2 5 4 T7 13 
m 16 16 16 16 16 
Ex 7+ 102+ 1464+ 1724 215 = Уух = 692 
EX. 2D o 768+ 1,550 + 1,9824 3,059 XY 3 = 7,638 
(EX) 3,249 + 10,404 + 21,316 + 29,584 + 46,225 = X (Xx)? = 110,778 
Means 3.56 6.38 9.12 10.75 13.44 X= 8.65 


are included in this table, and the necessary sums are given at the bottom 
of the table, separately for each group. 


| Summing across yields the re- 
quired double sums. The group means are also given, although not actually 
needed in determining F. 


The sums of squares (of deviations) are obtained by substituting in 
formulas (15.6), (15.7), and (15.8): 


EX(X — Xy = 1518007638) — (692)2] = 1652.20 
ЖУ(Х — Xj = W[16(7638) — 110,778] 


= 714.38 
mZ(X,— Xy = vo[5(110,778) — (692)°] = 937.82 


ll 
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Table 15.3. Variance table for data of Wright 


Source SumofSquares df Variance Estimate 


Between 937.82 4 234.46 = 5°, 
Within 714.38 75 9:58. = sy 
Total 1652.20 79 


а check on the arithmetic involved in substituting in the formulas. This 
does not check on the accuracy of the sums given in Table 15.2. Note also 
that the degrees of freedom add to the total df. 

The variance ratio, or F, becomes 234.46/9.53 or 24.60. With dfs of 
n, = 4 and n, = 75, we refer to the table of F to learn whether 24.60 
is larger than expected on the basis of chance. That this F is highly 
significant is immediately apparent when we note that for the given dfs an 
F of about 5.2 is significant at the .001 level. With the between-groups 
variance estimate significantly larger than that for within groups, we can 
conclude with high confidence that the five sets of scores have not been 
drawn from the same population of scores, or that amount of time spent 
in practice is a real source of variation. This is, of course, equivalent to 
saying that the several group means considered simultaneously differ 
significantly among themselves. 

In the illustration just given the groups can be arranged in order before 
any of the data are seen, and additional credence can be placed in the 
results because the means follow this ordering. It should be understood, 
however, that the variance technique does not presuppose an a priori 
ordering of the several groups—it is generally applicable for testing the 
significance of the differences between group means regardless of prior 
considerations. 

If only the z or г technique were available and we wished to compare 
the means for five groups, it would ordinarily be necessary to compute t or 
z for each possible difference, and five means would lead to 5 x 4/2 or 10 
differences. Obviously, the variance method requires less computation, 
and furthermore it provides an over-all test of significance which is not 
subject to the fallacy inherent in singling out the comparison involving the 
largest obtained / or z, a practice which is likely to capitalize on chance 
differences. This problem is discussed at the end of this chapter. 


SPECIAL CASE OF F TEST WHEN n; —1 


If we had G — 2 groups, the testing of the between-groups variance 
would appear to be much like testing the difference between two means. 
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Let us examine this case b 


y starting with the expressions for the sum of 
Squares for two groups: 


Ist group: X(Y— Y? = X(x — X + mX, — Xy 

2nd group: E(X — Y? = у(х — Xy? + mX, — Xy 
Instead of using double summation signs, we may indicate the within- 
groups sum of squares as X( Y — Xy + (х X>)?, and the between- 
groups sum of squares as m(X, — ¥)2 + m(X, — X)?. The respective dfs 


will be 2m — 2 and 1. Indicating the division of the sums of squares by 
their dfs, we can write the Variance ratio as 


m(X, — Xy + m(X,— Х)? 
11—22 7 тз — Xy 


XX — Xy! 4 xx — Xy 

2m — 2 
Since the number of cases for the two groups is the same, it is readily seen 
that the mean for one group will be exactly as far above the general mean 
(X) as the other group mean is below Y, or that X will bisect the distance 
between X, and Х,; therefore (X,— yp = (X, — xy = yx, — X. 
Scomes (m/2)(X, — Xe It win be noted that the 


The numerator for F b 
denominator term, which defines 5? ,, is identical to the s? defined by (7.2) 
ordingly, we may write 


in connection with the ¢ test. Acc 
PH us -— 
3; 1- XS 


Bep. 
5° 
Dividing both numerator and denominator by m/2, we have 


Felis 
the square root of which is 


aa LIS. 
н (2. sies т 
т т т 
which is identical with а 


formula for t, p. 103. When G = 2 or two groups 
are being compared, then Е = р, 


It can be shown that this is also true 
when the Ns or ms for the two groups are unequal. In fact, it can be shown 
that, when n, — 1, the Sampling distribution of F becomes the same as that 
for г? provide 


d the estimate based 


on between groups, i.e., that based on 1 
degree of freedom, is used as the 


numerator regardless of which of the two 
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estimates is the larger. It is thus seen that the / test is a special case of the 
Ftest. Note that F involves the square of the difference between means; 
hence it provides a basis for judging whether a difference between means, 
irrespective of direction, is significant (cf. pp. 248-49). The z technique 
for comparing the means of two large samples is also a special case of the 
more general F test. That is, when л, = 1 and л» is not small, the square 
root of F is z, interpretable via the normal curve table (Table A of the 
Appendix). 


GROUPS OF UNEQUAL SIZE 


When the number of cases varies from group to group, we may let 
ту, D, * * * , Mg, ` ` * , mg Stand for the several №. The sum of squares for 
the gth group would be written as 


E(X — X} = U(X — X, + т,(Х, — Xy 
and the double summation over all groups would be 
УХХ — X} = У(Х — X)? + Em (X, — Xy 
g g 


which differs from formula (15.2) in that the varying ms must be left under 
the summation sign in the last term. In specifying the degrees of freedom, 
we must replace mG by N, where N is the total cases for all groups. The 
respective dfs become N — 1, N — G, and G — 1. The computational 
formulas are changed to 


€ . (ххх) 
xxx — Xy = EEX? e 


for total sum (15.9) 


3» 43 
SUX — X)-zxx'- ze for within sum (15.10) 


m, 


vv? УХ 
Ак (+, forbetweensum (15.11) 


Xm(X,— Xy = zi 
А 7 m, 
Note that the second term for the within sum (and the first for the between) 
requires that for each group the square of the sum of its scores be first 
divided by its m; then the several quotients are summed. An additional 
row would be needed along the bottom of Table 15.2 for these quotients if 
the ms differed, or the (X)? row might be replaced by (2:X)?[m, values. 
A variance table (like Table 15.3) may be formed, and F taken to equal 
52/52, as before. The same interpretation holds: if Fis significantly large, 
i.e., if s*, is significantly larger than s?,, the variation of the several group 
means among themselves is larger than expected on the basis of sampling; 
hence nonchance differences exist between the groups. Although for the 
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unequal m situation the sampling values of s*,/s*,, follow the F distribution, 
the use of unequal ms does not provide as sensitive (as powerful) a test as a 
test based on equal ms. 


If N cases are to be divided into G experimental 
groups, it is preferable to assign m = М/С cases to each group, unless there 


is a cost factor that is differential from experimental to experimental 
condition. 


the groups in order. It might be said paren- 


thetically that the scientific hypothesis being tested will specify the direction 
of differences if such are expected. 


TESTING THE SIGNIFICANCE OF THE CORRELATION 
RATIO 


Ps variance, the grouping being made on the basis 


‚ for which we shall use the simpler symbol y. The 
he basis of the intervals 


on the X variable, and the 
res will be in terms of y. 


required sum of squa - The sums of squares and 


their respective degrees of freedom will be 


XX(Y — yy 


LÉXAXY- А mY, — yy? 
(N-1) yi 


(N-G) Р 
for G arrays with varying number, m 
definition formula of the Correlation ra 


(6—1) 


a» Of cases per array. From the 
tio, we have 


which becomes, in the notation of this chapter, 


EX(Y— ҮМ 
EXX(Y — Y/N 


Ж == 
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Since N cancels, we see that the following holds: 
XX(Y Y = (1 — 1)ХУ(Ү — Ү) = within sum of squares (15.12) 
g 


From the alternate expression for 7 we have 


which becomes 

Ут (Y, — Y/N 
2 

Ex(Y — Ү)уМ 


2 


?] = 
which leads to 
E m,(Y, — YP = тУУ(Ү — Y) = between sum of squares (15.13) 
2 


When we wish to divide the sum of squares of formula (15.12) or (15.13) 
by the proper df, we may choose either the left- or right-hand part as 
representing the sum of squares. Thus the between-arrays estimate may be 
written as 

Ж WELY — Yy 
' б—1 
and that for within arrays as 
2  (1—5)9)EX(Y — У) 
gh = oo 
N—G 
The ratio, F = s?,/s?,, may be written as 
MELY — YPG — 1) 
(1 — 55)EX(Y — YPN — С) 
„C-i 
а — nN — G) 


It is accordingly seen that for fixed dfs the value of F, even though com- 
puted from the sums rather than from their equivalents in terms of 72, can 
be thought of as depending on the size of э; therefore a significant F 
indicates a significant correlation ratio. 

With the three sums of squares computed, we can readily determine 
whether any correlation in the sense of the correlation ratio exists, and 
we also have the necessary sums for calculating 7 if it is desired to have 
this measure of the degree of correlation. A significant F does not, how- 
ever, mean a high correlation ratio; with N large, a low 7 can possess 
statistical significance. 

The computation of the sums of squares is accomplished by means of 
formulas (15.9-15.11) with the Xs replaced by Ys. 
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SIGNIFICANCE OF LINEAR CORRELATION 


An appreciable correlation between two variables which are linearly 
related implies that the slopes of the regression lines are not zero, which 
in turn implies that the variance of Predicted values is large enough to 


have some kind of statistical Significance. The variance technique may 
be used as a test of the significance of linear regression. 
Suppose that we develop the 


on X. We may write the linear 


Y- N=- Y? (у y) 
in which Y' will vary from person to person in accordance with his X 
Score. If we square all such (Y — Y) deviations and sum over all cases, we 
get 


ZEX(Y — yy 
=Y- Y) + (Y — yy 
= (У YP 4 XY — yey F — Y'(Y'— Y) 
on signs are not needed for clarity even though 
The last or cross-product term has to do 


ween predicted values and residuals, but, as 
correlation is always zero, and hence this 


Therefore the sum of Squares can be broken down into two components: 
residual 


8 or within arrays about the г 


Before attempting to understand the operation of chance sa 


mpling, we 
should consider the degrees of freedom associated with the sums of squares. 
As usual, the total sum of squares is based on N — 1 degrees of freedom. 
The df for X(Y — Y» 


may not be immediately obvious, 


‚ but note that, if 
xists for both X and Y, the терг 


N = 2 and variation е ession line would 
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necessarily pass through the two points defined by the pair of scores, r 
would be unity, and Х(У — Y^? would be zero, In other words, with 
N = 2, there is no freedom for deviation from the regression line. From 
this it would be inferred that N needs to be reduced by 2, or that df 
— N — 2, a deduction which is consistent with the fact that, in fitting a 
straight line, two constants are determined from the data, and hence two 
restrictions are imposed on the N deviations of the type (Y — Y’). 

Since the dfs for the component sums of squares are additive to that for 
the total, we can determine the df for the regression or E( Y' — Y)? term 
by subtracting the df for residuals from that for the total: (N — 1) 
— (N — 2) = 1 as the df for the regression term. But determination of a 
df by subtraction does not permit the additive check on the correctness of 
the dfs which is possible in case each df is ascertained separately on the 
basis of some principle. By what principle could we determine that for the 
regression sum of squares the proper df is 1? The value of X(Y'— Y) 
will not be changed by shifting from gross scores to deviation scores, i.e., 
by moving the origin to the intersection of Y and Y. It will be recalled that 
the regression equation in deviation units is y’ = bx (where b = В of the 
gross score form), and accordingly we may write 


X(Y'— F} = X(y — 7} = Vy’ — 0? = X(bxy? = po? 


which permits us to examine the source or sources of variation in the 
regression sum of squares. Its value depends on b? and Ea, but the value 
of Za? does not depend on the degree of correlation. For a fixed set of 
Xs, the freedom of Z( Y’ — Y} (о vary springs from b, i.e., from one value; 
therefore the df is 1. A slightly different way of considering the question 
is to note that since b = r(S,/S,) and Xa? = NS? ,, the sum of the squares 
of the predicted values can be written as 


2 
xy'- Y= PERS = Nr°s?, 
a 


from which it can be argued that, since the variation in predicted values isa 
function neither of N nor of the variance of the trait being predicted, it is a 
function of one value, the degree of correlation. 

Now let us return to a brief consideration of sampling or of the meaning 
of the variance estimates which result from dividing the sums of squares 
by their dfs. On the basis of the null hypothesis, that the degree of linear 
correlation is zero for the population being sampled, the regression line 
for the population would pass through Y, with zero slope or parallel 
to the z axis. Hence (Y — Y") will equal (Y — Y), and the variance of 
the residuals will equal the total variance of the Ys. A sample from 
the population will seldom yield zero correlation (or zero regression), and 
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therefore the residuals will tend to be somewhat reduced, or (Y — Y^? 
will tend to be less than EX(Y — Y}. It сап be shown that x(Y— ү) 
(N — 2) gives an unbiased estimate of the population variance when no 
correlation exists in the population. 

That the estimate based on the re 


gression sum of squares, X( Y' — Y, 
divided by df — 1, is also an unbi 


ased estimate of the same population 
variance may not seem plausible, nor is it easily explained in an elementary 
treatment. For any sample, X(Y' — Y)? equals the difference between 
ZX(Y — Y) and X(Y — Y^?, and it can be demonstrated that on the 
average the value of ZX(Y — yy — X(Y— У) will equal EX(Y — Y)?/ 
(N — 1), or that the mean value of X(Y' — Y} for successive samples will 
be EX(Y — YY/(N — 1). Since the latter is an unbiased estimate of the 


population variance, it follows that X(Y' — Y)/1 must be an estimate of 
the same variance. 


Of the three variance estimates, only the estimates based on residuals 


and on regression are independent. The sampling distribution of their 
ratio is that of F. Let 5°, stand for the estimate based on the residual sum 
of squares and 5° stand for the estimate based on predictions by a linear 
regression function. Then, if 5*,/5*., with m = 1 and п; = N — 2, falls at 
or beyond the .01 level of significance, the null hypothesis becomes suspect. 
This means that the 5?, estimate is larger than expected on the basis of 
sampling, from which it may be inferred that regression is a real source of 


variation in Y(Y'— Y» ie, that the slope of the regression for the 
population is not Zero, or that some correlation exists. 
We have already noted that 


X(Y'— Yo = wes, 


Since X(Y — У”)? divided by N equals the error of estimate variance, 
previously proved to equal S?(1 — r2), it follows readily that 


XY — Y? = MU — зуу, 
Accordingly 
, үү 
& edm = Nrs’, 
1 
and 
Pary — Ма — r5s?, 
N-2 N-2 
Therefore 
F= Nr?S? 11 r? 


NA — ASAN) (1 AN 
which is the square of the t, fo 


rmula (10.2), for testing the significance of r. 
Thus, again we have F = 72 


> when n, = 1. 


[15] ANALYSIS OF VARIANCE: SIMPLE 275 


The reader will have noted that, since the required sums of squares and 
the resulting F can readily be expressed in terms of r, there is no need to 
worry further about a computational scheme for securing the sums of 
squares. The easier thing to do is simply to.compute r. After that is done, 
either the F or the / test may be used for judging whether the correlation 
issignificant. This discussion of the linear correlation problem here should 
help the student appreciate the generality of the analysis of variance 
technique and should also provide him with relevant concepts for under- 
standing the test for curvilinearity of regression, to which we now turn. 


TESTING LINEARITY OF REGRESSION 


We have seen that the correlation ratio is a general measure of the 
degree of correlation and that r measures the degree of linear relationship. 
Even though the regression of Y on X for a population be exactly linear, 
it will be found for a sample that the means of the arrays will show some 
deviation from a straight line; hence as previously pointed out, the correla- 
tion ratio will tend to be larger than r. How large should the difference 
between 7 and r be before we suspect nonlinearity, or how much can the 
array means deviate from a straight line by chance? Before the develop- 
ment of the analysis of variance technique, the inadequate Blakeman 
criterion was used to answer the foregoing. In presenting the currently 
accepted method, we shall carry the argument through on the basis of the 
regression of Y on X. 

Imagine a scatter diagram with regression line drawn and the array 
mean located in each vertical array. For a score in the gth array, the 
deviation of Y from Y can be thought of in terms of its deviation from the 
array mean, Y,, plus the deviation of the array mean from the predicted 
value, Y’,, plus the deviation of the predicted value from the total mean. 
In symbols, 


(Y- Y)2(Y — Y) + (Y, — Y') - (Y', — Y) 


Squaring and summing for the m, cases in each array and then summing 
over all G arrays (equivalent to summing over all groups), we have 


Ex(Y — Y? 2 EXQ(Y — Y + Em Y, — Y + Zmj((Y', — Y} 
g g 


the cross-product terms having vanished because the component parts are 
uncorrelated. 

The first component is a sum of squares based on within-array variation 
with № — G degrees of freedom. We encountered this in checking the 
significance of the correlation ratio, and we then labeled as s?,, the variance 
estimate based thereon. 
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The second sum involves deviations of arra 
sion. Its df will be G 
constants in Y’, 
Let us use s? 
Squares. 

The third sum, which has to do w 
predictable by means of linear re 
a few pages earlier in connectio 
cient. It differs only in that the 
an array regardless of their loc 
This is equivalent to a linear pr 


y means from linear regres- 
— 2 since there are G means and two restrictive 
. ЕС = 2, the two means cannot vary from the fitted line. 
a 8$ а symbol for the variance estimate based on this sum of 


ith the part of the total variance 
gression, is very similar to that occurring 
n with the F test of the correlation coeffi- 
same value is predicted for all cases within 
ation in the Y interval defining the array. 
ediction of the mean of the array. Actually, 
the numerical value of X( Y' — Ү)? as calculated by Nr*S?,, which equals 
r*XY(Y — Y}, will be the same as Ymj(Y', — yy computed directly, 


g 
provided r was originally determined from a scatter diagram with the same 
intervals now being use 


d to define the arrays. We have already seen that 
the df for this sum is 1, and we have used 5° аза Symbol for the estimate 
based thereon. 


It will be recalled that, in the scheme 


correlation ratio, the total sum of squares was broken down into a within- 
array anda between-array part. 


We now have a breakdown into within 
array (as before) plus two additional parts—the sum Em(Y,— Y} is 
broken into = d 

Ут == VOY a. mY’, — yy 
g 0 
It will also be recalled that 


for testing the significance of the 


Xm(Y,— Ү) —xxy— Y) 
and that z ыч " 

Em, — YP =r EY -YF 
By subtraction, we see that the new sum, xim, Y, — Y’,), is equivalent 
to (n? — rYxx(y — yy. 0 


For convenience, we shall now assemble in an analysis of variance table 
the several symboli 


freedom, and a symbo 


that for the sums of Squares, their equivalents, and the dfs, the following 
additions hold true: 


(a) + (b) = (c) 
(а) + (е) = (f) 
(с) + (4) = C) 
(a) + (b) + (d) = Cf) 


[15] ANALYSIS OF VARIANCE: SIMPLE 277 


Table 15.4. Analysis of variance functions for bivariate correlation 


Source of Esti- 

Variation Sum of Squares Equivalent df mate 
(a) Linear 

regression Em(Y',-— Yy-rxx(Y-Yy 1 r 
(b) Deviation 

of means 

from line Em(Y,—-Y -Gt—-mnbxY-Y? G-2 s, 
(c) Between- d 

array теапѕ Em (Y, — Y? = УУУ — Y? G-1 5%, 
(d) Within d 

arrays EX(Y— Y -(-:5)xxY- ye N-G 32, 
(e) Residual á 

from line ZXY- У) = (1 ELY- F N2 у, 
(f) Total EX(Y — Y} N-1 


The several useful and permissible Fs, or ratios of independent and 
unbiased variance estimates, along with the proper dfs (n, and n, values) 
for entering the table of F, may be stated in summary form: 


Fi = afse m = G l, п = N — б: significance of correlation 
ratio 


Е = 52,|5,; п = 1, пъ = № — 2: significance of linear 
correlation 


Fy = salo; m = б — 2, n = N — С: significance of curvilinearit 
3 al ^w 1 2 y 


We have already discussed the first two of these Fs. If we write the third 
in terms of sums and dfs, we have 2 
$c Xm Y, — Y'y (a — 2) 


F, = 
* S, DY- YN — с) 


= OF — r$5rxx- YPG- 2) 
A — )ХУ(У — YPN — с) 
=-= Pyc-2) 
(1 — 4)/(N — с) 


which indicates definitely that its value, for given dfs, is a reflection of the 
difference between the correlation ratio and the correlation coefficient. 
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Therefore, in testing the significance of the variation of array means from 
linear regression, we are testing the significance of the difference between 
папа т. If Fs falls beyond the .01 probability level, the hypothesis of linear 
regression for the population being sampled is rejected. When this 


happens, it follows that the correlation coefficient and a linear regression 
function for Y on X are not appropriate measures to use in describing 
the relationship. 

If we are also interested in testin 


5 the significance of the correlation 
ratio for X on Y and the linearit 


y of the horizontal array means, the analy- 
sis is carried through with Xs Substituted for Ys. Since the number of 


grouping intervals on the two axes need not be the same, the value of G 
may differ for the two analyses, 


ILLUSTRATIVE PROBLEM: r, n, AND CURVILINEARITY 


The foregoing three tests of si 
thereto may be illustrated b 
variate distribution for the 


gnificance and the computations necessary 
y the data of Table 15.5, which gives the bi- 
relationship between initial (sum of scores on 


Table 15.5. Bivariate Scatter for initial and final scores of 


92 boys on Koerth 
pursuit rotor 


= Initial 
Y — Final X — Initial Score 
Score Code 


60 90 120 


740 
700 
660 
620 
580 
540 
500 
460 
420 
380 
340 
300 


o-muauo-€o 


10 7 
85 60 

xy: 747 520 

(5 Уут, 722.50 | 514.29 


А = т, 
Хү 


405.00 | 4580.35 


trials 1-4) and final (trials 67-70 
rotor. Since it is logical to be con 
initial score, or the regression of 
tions on the Y variable. 


) performance on the Koerth pursuit 
cerned with the prediction of final from 
Y on X, we shall be dealing with varia- 
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In the first place, the correlation coefficient is computed from the scatter 
diagram by the method given in Chapter 8. Its value of .5687 is about .01 
lower than the coefficient computed from a scatter with twice as many 
intervals. The use of so few intervals for the X variable would obviously 
not be recommended for the computation of r, but in this illustration it is 
convenient because of page-space limitations. There is the additional 
consideration that for computing the correlation ratio we should avoid 
having too few cases per array, which if the sample is small may mean only 
a few intervalson the independent variable. At least twelve intervalsshould 
be used for the dependent variable. In checking on linearity, it is necessary 
that we calculate r from a scatter with the same grouping intervals used in 
computing 7, and no corrections for grouping error are needed. 

For the computation of the correlation ratio and for the testing of its 
significance, we need the within arrays, the between arrays, and the total 
sum of squares. These may be computed from coded scores (deviations 
from an arbitrary origin in terms of step intervals), and the entire-analysis 
may be carried through on the basis of coded scores, so that cumber- 
somely large figures are avoided. The reader who wishes to follow the 
computational procedure will need to note the following features of Table 
15.5. The marginal frequencies on the right are for all the Y scores, and 
the f,s along the bottom margin are the m,s, or cases per array. For each 
vertical array and for the right-hand margin, X Y and X Y? are computed 
in terms of coded values (these correspond to Xd and X? of Chapter 3). 
Summing across the Х У and ХУ У? rows should yield the XY and XY? 
obtained from the marginal distribution. For this problem, ХХ У = 636 
and ЖУ Y? = 4846. The last row, containing theseveral values of (> Y)?/m,, 

2 
is summed across for the needed X cn 
g 
There is no check on this figure by calculations based on the margin. 

In order to get the sums of squares of deviations, the values 636, 4846, 

and 4580.35 are substituted in formulas (15.9-15.11) with X replaced by Y. 


Zx(y— Y = 4846 — 636. 
92 


, Which is 4580.35 in this example. 


2 
= 449.30 


XX(Y— Ү,) = 4846 — 4580.35 = 265.65 


2 
Em4Y,— F)? = 4580.35 — 56 = 183.65 


By formula (15.13) we now obtain 
Pe 18305 _ A0874; 4 = 1635 
449.30 
which is the correlation ratio for Y on X. 
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The other sums of s 
calculated from their 
r? = 56872 = 32342. 


quares called for in schematic Table 15.4 may be 
equivalents in terms of r2 and/or 7. Note that 


Em(Y', — Y)? = (32342/449.30) = 145.31 
g 

ZXY- ү) =(1— .32342)(449.30) = 303.99 

92 
= (40874 — .32342)(449.30) = 38.34 


Table 15.6. Analysis of variance table for regression of final (Y) on initial score 
for data of Table 15.5 


2m (Y, — + ы 
g 


Source Sum of Squares df Variance Estimate 
Linear regression 145.31 1 145.31 = s?, 
Deviation of means from line 38.34 5 7.67 — 5а 
Between-array means 183.65 6 30.61 — s?, 
Within arrays 265.65 85 3.13 = 52, 
Residual from line 303.99 90 3.38 = s?, 
Total 449.30 91 


we have the following. 


For testing the significance of the correlation ratio we have А 


— 6 and n, — 85 is highly significant. 
nificance requires an F of about 4.0. 
linear correlation, i.e., г, we have Fa 


7; = 1 and n, = 90 is likewise highly 
significant, the .001 level being at an F of about 11.6. 


The student should keep in mind that the test for linearity can lead to 
the definite conclusion that the regression is curvilinear (if F j 


s large 
enough), whereas a low F does not prove linearity. Why? 
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If the hypothesis of linearity is disproved, it follows that the correlation 
coefficient is not a suitable figure for describing the relationship. The 
correlation ratio can be used to describe the degree of association, but the 
form of the relationship should be described by a fitted curve or by a verbal 
description of the general curve tendency of the arraymeans. Some readers 
will have noted that the correlation ratio cannot be considered very descrip- 
tive of the data of Table 15.5 because of heteroscedasticity. 


APPLICATION TO MULTIPLE CORRELATION 


The reader may recall that the methods given in Chapter 11 for judging 
the significance of the multiple correlation coefficient involved unsatis- 
factory approximations. Insofar as we are interested in testing the devia- 
tion of a multiple r from zero, the analysis of variance technique provides 
an exact test which is applicable when the sample is either small or large. 

Let us suppose that Y is a dependent variable which is to be predicted 
by a multiple regression equation containing m independent variables 
designated by Xs. The prediction equation may be written as 


Y'= A + BX, + ВХ ++ Bn Xn 


in which the Bs are the regression coefficients. The deviation of any 
individual's Y score from the mean Y can be expressed as the sum of two 
parts: the deviation of his Y from his predicted value plus the deviation 
of the predicted value from the mean of the Ys, thus, 


(Y— Y)-(Y- Y) + (Y' Y) 
If we square both sides and sum over all cases, we have 
xx(Y- YP = X(Y— Y? -x(Y'— Yy 


which is exactly analogous to the breakdown used in connection with the 
test of the linear correlation coefficient. One part has to do with residuals 
about the regression plane, the other with variations in the predicted values. 
The cross-product term again vanishes—it can be shown that there is no 
correlation between residuals and predicted values. 

As previously, we label the X(Y — Y")? as the residual sum of squares 
and X( Y’ — Y)^asthe regression sum of squares. The total sum of squares 
will, of course, have N — 1 degrees of freedom. The residual sum of 
Squares will lose dfs according to the number of constants in the regression 
equation. We have the constant А, and the number of В constants is m; 
hence df = N — (m + 1) = N — m — 1 for the residual term. The reader 
who does not immediately see the reasonableness of this should consider 
the case of one dependent and two independent variables with varying 
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scores on N = 3 cases. Imagine that the three scores for each case can be 
used to locate a point for each in three-dimensional space, and then think 
of fitting an ordinary plane to these three points. Obviously, the plane can 
be made to pass through all three; hence the prediction would be perfect, 
and there would be no freedom for any of the three points to vary from the 
plane. That is, with N = 3 (and with variation on all three variables), 
the multiple derived therefrom must be unity. 

Now, as to the df for the regression or prediction sum of squares, we note 
that for a fixed set of values for the Xs the variation of this term must 
depend on the slopes of the regression plane or on the Bs. There being m 
Bs, there are m ways in which this sum can vary; therefore df — m. This 
is, it will be noted, an extension of the argument used to explain why 
df = 1 for testing the linear correlation coefficient. I 
tions are correct, we should have (N 
which is seen to be the case. 


In Chapter 11 it was pointed out that the multi 
can be defined as 


f our df determina- 
— m — 1) + m adding to N — 1, 


ple correlation coefficient 


5 
Sas... 


85 


2 
rig- = 1 — 


in which 5.2... represents the residual variance and 5° is the variance 
for the dependent variable. Since the residual variance p 
variance adds to the total, the multiple r can also be expr 
of the predicted to the total variance. (Note that we are 
variances, not estimates.) By definition, 
— Y'PIN, the predicted variance is Z(Y' 
is EX(Y — F)? 
coefficient, usin 


lus the predicted 
essed as the ratio 
here speaking of 
the residual variance is X(Y 
— Y)?/N, and the total variance 


N. We may therefore write the multiple correlation 
E R in order to avoid Subscripts, as 


к? = 1 XY- YIN 
EX(Y— Yy]N 
from which it is readily seen that 
X — Y* = (1 — R)zx(y уу 

From the alternative way of regarding multiple correlation, we have 

gio XQY'— YyjN 

ESY — YIN 

which leads to X(Y' — F} = REXY — yy, 


Thus the sums of squares have their equivalents in terms of R, and 
consequently they may be computed by way of R. The computation of 
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these sums directly would be a hammer-and-tongs approach which would 
involve the laborious task of predicting by means of the regression equation 
the Y for each individual. 

The foregoing may be assembled in a schematic variance table, like 
Table 15.7. As in testing the significance of the ordinary correlation 
coefficient, we set the null hypothesis to the effect that the estimate based 
on the regression sum of squares will differ from that based on the residual 
sum only because of chance sampling errors. The null hypothesis implies 
that, if the entire population were measured, the correlation of the depen- 
dent variable with each independent variable would be zero. Now, when a 


Table 15.7. Variance setup for testing significance of multiple 
correlation coefficient 


Sum of Esti- 

Source Squares Equivalent df mate 

Regression X(Y'—- Y} = np Y — YP m * 

Residual XY-yyp-(-RAR)EXY- Y? N-m-1 s, 
Total Ixy — Ү)? N-i 


sample is drawn from such a population, the rs will vary more or less from 
zero with the result that the multiple R will likewise differ from zero. If 
the conditions of the null hypothesis hold true, the sampling distribution of 
5?,[s?. follows that of the F distribution with appropriate degrees of free- 
dom. Note that 
s X(Y' — Үт 

$, XY— YYK(N-—mc-1) 

RXX(Y— Yym 

REXY — YPN — m — 1) 


R?|m 
(1 — R®)(N — m — 1) 


hence F is a ratio which depends on A and the dfs. 1f the numerator is less 
than the denominator, we may conclude without reference to the table of F 
that A is insignificant. When the numerator is the larger, we judge the 
Significance of F by entering the table of F with m, =m and n, = N 
— m — |. Once R has been computed, the calculations involved in 
checking its significance are so simple that an example would be humdrum. 

In the chapter on multiple correlation (Chapter 11), it was pointed out 
that R as computed tends to have a positive bias, the extent of which could 
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be judged by formula (11.14). This formula can readily be derived by the 
use of estimated residual and trait variances in place of actual variances in 
formula (11.7). Best or unbiased estimates lead to an unbiased A, or 
provide an unbiased estimate of the population value of R. Formula 
(11.14) gives this improved estimate, but the improvement is negligible 
except when М is small, or when m is large relative to N. It should be 
stressed that neither the analysis of variance check on the significance of R 
nor the improved estimate of R allows for the fallacy involved in multiple 
correlation work when from among a large number of variables a few are 
chosen for inclusion in the analysis because they show correlation with the 
criterion. Such selection tends to capitalize on rs which 
highest partly because of chance errors. 

A practical question of considerable importance arises when we wonder 
whether the inclusion of additional v 
equation leads to a si 
when we wish to kno 
a significant decreas 
of additional varia 


are among the 


ariables in the multiple regression 
gnificant increase in the accuracy of prediction or 
w whether the dropping of certain variables results in 
€ in the amount of variance predicted. The inclusion 
bles in the equation always tends to reduce the error 
of estimate somewhat and leads to an increase in R. Can it be said that the 
increase in R possesses statistical Significance? 

Let R, be the multiple based on m, independent variables and R, be the 
value based on m, variables selected 


from among the т} Variables. To test 
the significance of the difference between R, and A, we take 


F= (RÀ — R*,)/(m, — m) 

(1 — RDN — m, — 1) 

with лү = m, — mM and n, = № 
we can safely assume that the ар 
or variables possesses statistical 


— m, — 1. If Ffalls beyond the .01 point, 


parent gain in using the additional variable 
significance. 


INTRACLASS CORRELATION 


Suppose we wish to specify the degree of resemblance of twins in terms 
of a correlation coefficient. 


of deciding which member 
which to the other. This ca 


х 
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family, the degree of resemblance can be specified by the intraclass correla- 
tion coefficient, computable by » Р 

MI ee (15.14) 

55, + (m — Ds, 

in which we have variance estimates for between families (groups or classes) 
and for within families. If F = s?,/s®,. is significant, we have evidence for a 
significant positive r’. Note that if there is no within-family variation, г” 
becomes unity. Note also that r' may be negative, but since in practice 
5*, will rarely be significantly larger than 5°,, one is seldom confronted with 
the necessity for trying to interpret a negative intraclass correlation. 

When the number of cases per family varies, the average of the m, values 
is used in place of m in the foregoing formula for r'. This does not affect 
the F test as a way of judging the significance of the correlation. 

The distinguishing characteristic of an intraclass correlation situation is 
that we have G sets of scores on just one variable with no way of ordering 
the scores within a set (a sort of interchangeability). It is obvious that r’ 
can be used to describe group resemblance, regardless of how the groups 


have been defined. 


SELECTED CONTRASTS 


When and only when F, as an over-all test, indicates significant differ- 
ences among the G groups may we safely make further tests to see whether 
two selected means differ significantly (here designated as a type D 
contrast) or whether one mean (or the average of two or more group means) 
differs more than chance from the average of other group means (character- 
ized here as a type D' contrast or comparison). We need to distinguish 
between two motivations for making such additional significance tests: 
we may wish to do so because an a priori hypothesis calls for examining 
a given contrast, or, as **data snoopers," we may wish to make certain 
comparisons suggested by the data. For the former, а г test is appropriate 
whereas for the latter the г test may be misleading in that such snooping is 
apt to lead to the selection of those differences that are the largest, a 
process which tends to capitalize on chance differences with a resultant 


vitiating of the level of significance. 
Regardless of motivation, we have to calculate either a D ora D' or both 


and an appropriate standard error. Suppose G — 5 groups, with means 
X P CHE uh Xi, and X;. For equal m, we might have D = Y; — X, or 
D' = (X, + X4 + X93 — (X + ХӘ, but with unequal m, the value of 
D' would be taken as the difference between weighted averages, thus 
mjX,4 mX; + m,X, mX, + т; 

m; + тз + т, Mg + mg 


р' = 
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For the sampling error variances we have 


2 afl 1 
‚кашыр. =, 
ME 


т; 
апа 


1 1 
sp. | + 
т + т + т т +4 т; 


If all т, = m, the latter simplifies to 


„-®(+1) 1) 
ы т 3+5 т b. 


in which a and b are the number of group means being averaged. Note 
that when a = b = 1, the latter yields 5°, as a special case. The required 


standard errors, or sp and sp. values, are obtained by taking the square 
roots of the foregoing variances. 


For the a priori hypothesis situation, we have D 


Іѕрапа D'/s py. as t ratios 
with X 


m, — G degrees of freedom. Thechosen level for judging significance 
carries the same connotation as that associated with any other ordinary 
t test, provided the comparison was decided on before any of the data were 
scrutinized. 

For the data-snooping situation, an allowance for the c 
chance-large differences can be made by any of several more or less 
satisfactory methods. We present here the Scheffé method because it does 
not require equal m,, because it can be used for both D and D’ types of 
contrasts, because it is robust under nonnormality and heterogeneous 
variance conditions, and because it is closely linked with the F test and 
requires only the F table.t The Scheffé method involves the computation 
of a quantity, which we designate as K, defined as the square root of the 
product of (G — 1) times the F required for the « level of significance for 
n —G-—]and n= im, — G degrees of freedom. For example, if 
we have adopted .01 as ж and have G = 5 and Xm, = 33, we find K as 
У 4(4.07), or 4.01. For any contrast to be regarded as significant at the 
о level, we must have Djs (or D'[s y.) equal to or greater than K. Stated 
differently, D (or D^) must reach Ksp (ог Ksp). D+ Ksp (or D + Ksp) 


will provide the 1 — « confidence interval for the population values of the 
specified contrast. 


apitalization on 


i The frequently advocated Duncan new multip, 
by mathematical statisticians, and the unpublish 
and when equality of variance holds. The reader is referred to the not-easy-to-read 


discussion by Scheffé of his S-method and the Tukey T-method in H. Scheffé, The analysis 
of variance, New York: John Wiley and Sons, 1959, 


le range test is currently under suspicion 
ed Tukey test is valid only for equal т; 
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Apparently this method is the best yet devised for contrasts or com- 
parisons of the D' type but for those of the D type it is lacking somewhat 
in sensitivity. However, along with this lack of power we have the advan- 
tages listed earlier plus the satisfaction of knowing that its usage guards 
against the making of the type I error too frequently when testing differ- 
ences suggested by the data. The error rate, using the ordinary г test for 
such comparisons, increases astonishingly as G increases. 


Chapter 16 


ANALYSIS OF VARIANCE: 
COMPLEX 


amental idea of the analysis of 
pplications to relatively simple 
tuations involved the testing of 
f the means for several groups, 
le classificatory principle. Such 
variable experiments, by which 


cited in Chapter 15 is an example of this. 
There are times when it is not onl 


Way. For example, in order to 

measured intelligence, we may 

Suburban, and rural groups; then, 

we may classify them as to occupational 

on may be by sex or by grade location or 

ich one variable is considered at a time is 
288 


by age. Such a procedure in wh 
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tantamount to the single variable setup. even though the same batch of 
data is made to answer questions about the effects of different independent 
variables. 

Now it is obvious that, in studying factors associated with intelligence, 
we could make a double classification by classifying our cases simul- 
taneously on two of the variables, or a triple classification by using three 
variables, etc. Consider for the moment a double classification based on 
the three rural-urban categories and on sex. This would lead to the assign- 
ing of the cases to six groups, each of which would have a mean IQ. 
Instead of having three means for groupings on the basis of the rural-urban 
characteristic, we would now have two sets of such means, one set for 
each sex. Instead of two means for the total group classified by sex, we 
would have three sets of sex means, a set for each of the three residence 
categories. 

This type of breakdown and similar ones where percentages instead of 
means are involved were utilized in psychological research long before the 
advent of the analysis of variance technique. The further breakdown of 
each sex group for residence status (or of residence groups for sex) is made 
in order to see whether rural-urban differences hold for the sexes separately 
(or whether the sex differences are similar for each of the separate residence 
groups). Although researchers were not confined to the single variable 
approach before the invention of the variance technique, they were defi- 
nitely limited in the possible statistical treatment of their data. Now that 
we have the analysis of variance method, we have an adequate statistical 
technique for checking such hypotheses as can be formulated concerning 
the influence of not only one but two or more variables. The advantages of 
using analysis of variance for such situations may be briefly mentioned. 

First, as we have already seen, it provides an over-all test of the signifi- 
cance of the difference between two or more means when either large or 
small samples are involved. 

Second, we shall soon see that it leads to a definitely improved estimate 
of sampling error when double or triple or higher-order classification is 
involved. For instance, when the older method is used to check the 
significance of the difference between the two sex means for the total 
group, the determination of the sampling error makes no allowance for 
likely heterogeneity in intelligence associated with residence status. The 
variance method permits a refined estimate of error by allowing for varia- 
tion due to one or more variables when the differences between groups 
classified on the basis of some other variable are being tested. 

Third, the variance technique provides a means of testing whether the 
influence of one independent variable on the dependent variable is similar 
for subgroups formed on the basis of a second independent variable. In a 
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sex-by-residence analysis of IQs, the breakdown of each residence group 
by sex will likely show that the sex differences are not exactly the same for 
the three groups and that rural-suburban-urban differences are not exactly 
alike for the separate sex groups. Such inconsistencies as seem apparent 
from examination of the six cell means may not be real for the simple 
reason that random sampling errors are present. 
of the variance technique there was no way o 


inconsistencies, except when each classificator 
two categories. 


Before the development 
f testing such apparent 
y characteristic led to just 


This last point has to do with what has been termed interaction, a concept 
which is not easily understood. Rather than provide a detailed discussion 
now of what is meant by interaction, we will give a simple illustration. 
Suppose it has been found that one learning method has a distinct advan- 
tage over a second method, but that, when the data are broken down for 
two recall intervals, the superiority of the first method seems to hold only 
for those with the shorter recall interval. This failure of the first method 
to be consistently better becomes an example of interaction. Before 
concluding that there is evidence for real interaction, we need to apply 
à statistical test. For such a simple breakdown, we could compute the 
difference between the first and second method means, and the standard 
error of the difference, for those with the short recall interval; likewise, 
for those with the long interval; then we could determine the difference 


between the differences and its standard error and therefrom obtain either 
а 2 ora rasa test of inconsisten 


be applied. 
It is the purpose of this chap 


used when classification into g 
variables. These extensions, w 


lying assumptions of normalit 


the available data." 


DOUBLE OR TWO-WAY CLASSIFICATION 


Suppose that the individuals (or their scores) are classifiable into C 
groups on the basis of one characteristi 
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and third column would be symbolized as Лз. The general pattern 
of labeling the scores is set forth in Table 16.1, which also includes along 
the margins a symbol for the several possible row and column means. 
Note that the first subscript identifies the row and the second the column 
to which a score belongs. The scheme used in denoting means should 
be grasped. Thus X. is the mean for the second column, whereas X;. is the 
mean for the second row. The “dot” in the subscript indicates the direc- 
tion of the summing for computing a mean—to get X. we sum X, scores 
with r taking on values running from 1 to R. 


Table 16.1. Schema for labeling scores and means for groups, 
double classification 


N 

as 

^ 

o 
= 


1 Ху Xe Ха Xe Xe X 
2 | Xa № Xa Xue X | Xs 
3 Xa Xa Xa Xæ X| Xs 
P Xn Xe Ха Xr Xi | Xr 
R | Хр Xr Xr Хве Xnc | Xn 


The deviation of any score, X,,, from the total mean can be expressed 
in terms of the deviation of its row mean from the total mean, (X,. — X), 
plus the deviation of its column mean from the total mean, (X., — X), 
plus a sort of remainder term which represents an individual variation 
over and above that due to the groups to which the score belongs. To 
secure an expression for this term, we note that by definition the term 
must be the part of the score deviation (from the total mean) left over after 
the sum of the two parts specified above have been subtracted. Accord- 
ingly, we have _ _ _ _ 
(X, — X) — К. — X) + (X, — X)] 


which simplifies to E = - 
(X, — X. — X, + X) 


We may therefore write the following identity: 
(X, — X) = (X. — X) + QC, — X) + QG, — X. — X. + X) 


With r running from 1 to R, and c taking values from 1 to C, there will, 
of course, be RC individual deviations. We need the sum of their squares, 
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which sum will involve the squares of the three parts, plus three cross- 
product terms that can be shown to vanish when summed. It may be 
instructive to indicate how the sum of squares for all RC scores can be set 
up. Suppose we begin by writing the squares of the deviations for scores 
in the first column. Each of these squares will involve cross-product terms, 


which we shall here ignore except for a plus sign to indicate their existence. 
We have for the first-column scores: 


(Хы — X? 2 (X. — Xp + (X, — Xy + (4, — X.— + X?4 
n — XY = (No, — XY + (Fy — X* + (ly — X, — X + T 
(Xa — XP = (X. — X? + (X,— Xp 4 (X4 — X. — ¥ + XP + 
Gm — XP = (Xp. — XY? + (X, — Xp 4 (Xm — Xy. — ¥ + XY + 


The summing of these s 
involves R cases, Le. rru 
denotes this fact. 


quares of deviations for scores of column 1 

ns from 1 to R; hence we need a symbol which 

Let us use X for this purpose. Note that the second 
т 

term on the right is constant fo 

the summation sign by R. 


The sum of the first column Squares, and by analogy the sums for the 
other columns, can be written as: 


Ist col.: 
E(X, = X= 


2nd col.: 


r all R scores, which permits us to replace 


ZG. — XP+ ВОХ – X? XO, — Y, — Y, 4 X? 


У(Х – X} = U(X, — XP 4 RQOG,— XP+ U(X» — X.— Xa+ Xy 
cth col.: 

Xe- IP= EK, — Xy 4 RY, — xy 
Cth col.: 
EG X} = 


DOE Ns os he d 


EQ — XP + ROLL yy 
+ (Go — X, — Xo + XY 


ТЕО» — 3) = CHE, — Xy 4 RUCK, — yy 


*IXXQ-—X.— Yo Xy (161) 
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The reader who is worried about whether the cross-product terms really 
vanish should note that for the cth column the product term 


x(E- Ў, D= e- DEK, — 3) 


vanishes because (X, — А) = 0. The other two cross-product sums 
T 


have as one factor the remainder or residual term; we have already had 
examples of a general principle that product terms involving residuals 
vanish. 

From formula (16.1) we see that the total sum of squares can be broken 
into three additive components: between row means with R — 1 degrees 
of freedom, between column means with df of C — 1, and a remainder. 
The degrees of freedom for the last part can be ascertained by a principle 
analogous to that used for getting the 7? df for contingency tables. The 
marginal means constitute restrictions on the deviation score entries in the 
rows and columns—when deviation scores for (R — 1)(C — 1) cells are 
filled in, the rest of the entries become fixed; hence df = (R — 1)(C — 1). 
Note that the dfs for the three parts sum to the df for the total sum of 
squares or RC — 1. 

Dividing the three sums of squares by their dfs leads to three variance 
estimates, 5°, for that based on rows, 5°, for columns, and s?,, for that based 
on the remainder, sometimes called error, sum of squares. We have two 
null hypotheses: that the row means are chance variations from one 
population mean, and that the column means are also variations from 
one population mean. As in the simpler situation, if the estimate based 
on rows is larger than expected on the basis of chance, it follows that there 
are real differences between the population means for the groups defined 
by the rows; likewise, for column means. 

In testing the significance of either of the two between-groups variances 
when the RC scores belong to RC individuals, we use the remainder 
variance estimate as the denominator of the F ratio. This involves an 
assumption, to be discussed under the heading ‘‘Choice of error term,” 
p. 309. For testing the variation of row means, we have F = 5°, /52,, with 
m = R — 1 and m = (R — 1)(C — 1). For column means, F = s*,/s®,, 
with m, = C — 1 and n; = (R — (С — 1). If an F so defined happens 
to be less than unity, we know at once without reference to the table for F 
that the variations of the given means are insignificant. Note that, since 
the error variance used in the denominator is a residual after the parts of the 
total associated with between-row and between-column variations have 
been subtracted, it follows that we are using as our error term a variance 
which has been freed of the influence of heterogeneity with respect to the 
two classificatory variables being investigated. 
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» but for other bases for forming groups there are 
definite limits on the number of 


reaction time the number of po: 


SIGNIFICANCE OF THE 


DIFFERENCES BETWEEN 
CORRELATED MEANS 


the limiting case of C = 2. 
RX(Q. — XY, may be written as 


XX, me X. d X, sh Ху + У(Х, V X. 


must be the average of the two 


column means, or Y — (X + X32. Making these substitutions, we 
have 

z(x, _ ^ z X, A POE х) 

т 2 
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which simplifies to 
IZQG Ха Yat X 1ZQG — X4, — Xa 4 XY 


These two terms become indentical when we change the signs within the 
second parentheses, which change is permissible since the square of a 
function is the same as the square of its negative, e.g., (a)? = (—a)?. 
Hence we have 

а X4) — Qa ХӘР 


Now the first parentheses term is the difference between any individual's 
two scores, say D,, and the second is the difference between the two 
column means, which difference it will be recalled is the same as the mean 
of the differences, D. We have finally the remainder sum of squares as 
4 У (D, — D}, or one-half the sum of the squares of the difference scores 


about the mean difference. 
The F for comparing two column means becomes 
2 


(X,— Xe)? 


1 


s? 
—— 
Pe = 


Um 320D,- Dy 
R—1 
with лу = | and nj = А — 1. This reduces to 
(X4 — X3» 
Е = STR Ld 
X(D,— D) 


R(R — 1) 


which the reader will recognize as г? for comparing the difference between 
means based on sets of correlated scores with the standard error of the 
mean difference estimated by formula (7.1). 

We have seen in Chapter 6 that in testing the difference between the 
means of correlated scores we can, for the large sample situation, determine 
the needed sampling error either from the distribution of differences 
between paired scores or by means of the standard error of the difference 
formula with the correlational term included. The important thing to note 
is that the analysis of variance technique provides a method for testing 
the significance of the difference between two or more means based on sets 
of correlated scores. The scores may be correlated either because they are 
based on the same individuals working under C conditions or having C 
trials on some stunt, or because siblings or litter mates are involved 


296 PSYCHOLOGICAL STATISTICS 


(each of the C groups containing one case from each of R families), or 
because we started with R sets of matched individuals, one from each set 
being assigned to the several C groups. 

The F just discussed has to do with column means. What of the row 
means for the given setup? The means of the R TOWS represent the mean 
performance of each of the s2veral individuals, and a test of the significance 
of the estimate of variance based on the between-row sum of squares 
becomes a test of the significance of individual differences. Since it is 
known that individuals do differ on practically all psychological variables, 
such a test is usually a trivial test of the Obvious, and hence it is seldom 
needed. We may, however, have the Situation in which we wonder 
whether individual variation is significant in the light of known measure- 
ment or response errors. To this question we now turn. 


RELIABILITY OF MEASUREMENT 


Suppose the scores in each г 
individual on different forms 
variable. The column means 


rey 4S Outlined previously, provides 
these correlated means. 


Б case of C = 2; e.g., two forms of 

individuals. Now on page 295 it was 

Squares reduces to } X (p, — Б)? in 
r 


ssed as a deviation from the mean of 
1Xu* (cf. p. 80). Since S? = Ld?/R, 
р. When we recall the expression for the variance 


correlated values (6.6), it is readily seen that the 
€s can be written as 


which D,, a difference score, is expre 
the R differences; hence this term is 
the term becomes ERS? 
of difference scores for 
remainder sum of Squar 


5 R 
#200, =D) ie 2 (5°, + 5% — 2115515) 


If we make the usual assum 
variance, we can let = S 
versus form reliability coefficien 


ption that the two forms have the same 
= S? Then noting that ry is the form 
t, Таа, We can substitute and get 


R 
5055, — 2r,8*) = RS? (1 — ты) = 6°, 
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where 5°, is the error of measurement variance (see р. 147). When we 
divide this remainder sum of squares, RS?, by its df, we have the pre- 
viously labeled s?,, = RS?/(R — 1). Now let us recall the general 
relationship between S?and s? as biased and unbiased estimates of variance. 
For NY (say) scores, 5°, = Xy?*[N, апа s?, = Xy?|(N — 1). From the 
definition of 5°, we have Ху = NS?,, hence 5°, = NS?,/(N — 1). Accord- 
ingly, with 5°, as the error of measurement variance, it is seen that 52,, is 
an unbiased estimate of the error of measurement variance. We label this 
estimate s?,. 

Thus, under the usual assumption of equal form variances, the remain- 
der sum of squares and the variance estimate based thereon has to do with 
errors of measurement. The remainder sum of squares as actually com- 
puted includes an adjustment for possibly differing form means but does 
not allow for any difference in form variances. (It will be recalled that 
S?. computed via r,, is also unaffected by a difference in form means.) 
If we have C — 3 or more forms, the remainder term is likewise a base for 
an unbiased estimate of error of measurement variance. When we test the 
difference between row means, we are actually asking whether the individual 
differences are significant in light of the variability due to measurement 


errors. 

In our earlier discussion of reliability (pp. 145-52), nothing was said 
about unbiased estimates. Admitting that s?, represents a slight improve- 
ment over the biased S?,, we next ask whether the use of unbiased estimates 
leads to an improvement in the estimate of r,,. Reliability is sometimes 
defined by way of formula (10.15), i.e, м, = 1 — S?,/S*, or in population 
values as шь» = | — 0/03. This definition formula can become a 
computational formula provided we have a means of estimating o°, and 
o*,. If we were to plug in the unbiased estimates, 5°, and s?, (the latter as 


an estimate of the common form variance but based on the scores from one 


form only), we would have 
_ ЕК 1) р 5°, 
RS?,/(R — 1) S. 


* ы = 
Гак = 1 — SeS = 


from which we see that, when estimating r,, via variances in the two form 
Situation, it matters not whether we use unbiased or biased estimates of 


the variances. T 
In some areas of psychology it is feasible to approach the reliability 
Problem by way of m repeated measurements on each of N individuals. 


Feasibility depends on absence of practice or fatigue effects, permitting 
Us to ignore the ordering of the measuring—as first, or second, and so on. 
The m measurements for each person are averaged; we wish to know 
the reliability of these average scores, the coefficient for which we will 
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designate as r;,. By definition, г = 1 — о/о? in which об is the 
true (population) variance for the measurement errors associated with 
the “score” as an average of m measures and о? is the variance of the 
distribution of the scores as averages (the subscript i indicating that the 
ith person has such a mean score). These two variances would need to be 
estimated from the available data consisting of m measures on each of N 
individuals. The mean score for a person will have a sampling error since 
it is based on a sample of m measures. The square of the estimated 
standard error of the mean score for the ith individual would be s? 


eli) 
82 от in which 5*,(, is the unbiased estimate of th 


e score (single, not 
average) variance within individual i. If we assume that this within 


individual score variance is the same from individual to individual, a 
better estimate of the within person variance will be obtained by averaging 
the separate N estimates. But this is exactly the meaning of s?,, (see p. 255), 
placed now in the context of within persons instead of within groups. 
(The N individuals provide N sets, or groups, of m scores each.) We will 
therefore take as our estimate of the error variance for the scores as 
averages, 52,,,, = s? [т. 

To secure an estimate o?z 
averages, we note that sin 
we have 5°, = s?,/m. 
written as 5, = 52 
Substituting in th 


‚ the variance over persons of their scores as 
ce the between groups estimate s?, = ms?;, 
For the between persons situation, this could be 
»/m, which we will take as an unbiased estimate of o?z, 


€ definition formula, we have for the reliability of the 
Scores as averages, 
Pg 1 80 _ ү. Sy (16.2) 
52,/т s*, 


which is based on unbiased 
Chapter 15. 


The foregoing was concerned with the reli 
taken as the average of m scores or m Xs. 
X scores? Recall that when we have on 
expressed as Y = Y, + E and that for N 
o*, + о?,. The reliability of the Y scores, 


ability of scores, Х,, each 
What of the reliability of these 
€ score per person, it can be 
infinitely large, we have o?, = 


P'zz(pop) кей — 


An estimate Of r.c, can be obtained by replacing c?, and o?, by their best 
available estimates. With m measures on each of N persons, we still have 


(ignoring order of measurement) a one-way analysis of variance setup. 
random model. The within Broups estimate, 52, as a within persons 
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estimate would be an estimate of o?, and the between groups estimate, 5?,, 
would be an estimate of between individual variation. We could replace 
Se 0 4+ mo", 

by 
55, — 0, + то, 

in which и, as the population mean for the ith individual is the average 
of m measures when m is infinitely large; that is, и, is X, for the individual, 
and оё, is o°, 

It will be recalled that when we say the expected value of s?, is the sum 
of the two terms on the right of the arrow, we are saying that on the average, 
82, = 0°, + mo®,. If we solve for о?,, and substitute the estimate, s?,,, for 


bi 
2 
e 


о", we have 


m 
as an estimate of the variance of true scores. 
To obtain an estimate of o?,, we substitute the foregoing value of c?, 
^ А x 3 2 2 2 
and the estimate of o?,, given by s?,, into 0°, = o°, + о?,. These sub- 
Stitutions lead to 
2 2 2 2 
2 S Sis 2 0 Sw TS wt ms 
CS A 
т т 


which can be written as 


2 2 
2 _ Sy Qm — Ds 
Ce. 
m 


Then substituting the above estimates into Гл) = c?*,[c?. we have the 
estimate 

— ss 

m _ sh, 5, 

"ee са (т De 85, (m — Ds, 


m 


(16.3) 


which the alert student will recognize as r’, the intraclass correlation 
coefficient. We have, therefore, incidentally provided a derivation of the 
r' defined on p. 285. We see that r’, in the situation involving m measures 
on each of N persons, is a measure of the reliability of the X scores, when 
taken singly. 

Earlier (p. 208) we derived the generalized Brown-Spearman formula 
as a coefficient for the reliability of a sum (or average) of m scores: 


тг 
fn eme г: — 
= 146 (т – 1), 
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: , 8 
Let us substitute г„„ from equation (16.3), with 5°, standing for s i 
3, — So 


55, + (m — 1)s?, 
zz = 2 2 
S — Sw 
= Deets 
iia Уз; + (m — 1)s? 


m 


s$, = Sa 
= 55, + (m — Ds, 
OS (n — D, E (m — Ds, (m — Ds, 


$, + (m — Tst 


m 


Tg = 1 — —# 


zz 2 


Thus the Brown-S 
(16.2). 

Since the reliability 
to the observed trait variance, 


pearman formula leads to the value given by formula 
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the variance estimate based on the remainder sum of squares may be the 
error variance even for those situations where we have classifications into 
R groups rather than as A individuals, but as will presently be seen the 
term which we are now calling the remainder may not always be the one 
to utilize as "error." The within-groups variance estimate of Chapter 15 
was an “‘error’’ variance for testing the significance of the between-groups 
variation. In more complex setups in the analysis of variance, rules are 
required for choosing the appropriate error term. 


COMPUTATIONAL ILLUSTRATION 


The required computations for testing variation between column means 
and between row means will now be set forth. It makes no difference in 
the computational procedure whether we have RC individuals classified 
into R groups one way and C groups another way or R individuals with 
C scores each or R sets of C individuals matched or RC scores for just one 


individual. 
The computation of the required sums of squares involves an extension 


of formulas (15.6), (15.7), and (15.8): 


2 


XE(X,—- XP = d. [ROEE — (ххх,) | fortotal (16.4) 


Toc RC ros re 
REE = gia es (2%. - (szx.)] forcolumns (16.5) 
ji RC с\т тос 


° 


$ x.) = (zs к.) | forrows (16.6) 


The sum of squares for the remainder can be obtained by subtracting the 
sums for between columns and for between rows from the total sum of 
Squares. Formulas (16.4-16.6) may look forbidding at first, but actually 
the sums based on raw scores are easily secured by following a plan on the 
work sheet. Sum each row, and write the sums on the right-hand margin; 
sum each column, and write the sums along the bottom margin. Summing 
down the right-hand margin gives the total sum, and summing across 
the bottom margin should give the same total sum. Square all scores and 
sum to get the first sum in (16.4); square all the right-hand margin sums 
and then sum to get the first part of (16.6); square all the bottom margin 
sums and then sum to get the first part of (16.5). 

The student may do well to sit down at a calculator and perform these 
operations with the scores in Table 16.2, which contains visual acuity data 
on four (= R) individuals for three (— C) distances of the stimulus from 
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Table 16.2. Data for visual acuity, 4 individuals, 3 distances 
(Monocular, vernier method, coded Scores) * 


Distance (in Meters) 


Subjects 5 10 15 Za. Xr 
1 13 29 17 59 19.7 
2 4 9 19 32 10.7 
3 8 30 37 75 25.0 
4 9 27 53 89 29.7 
Xx, 34 95 126 255 
As 8.5 23.7 31.5 21.125 X 
ХХХ, = 255 EXX*. = 7709 
E(EX,j = 18,051 E (E Xr)? = 26,057 


* From Walker, E. L., Fa 


ctors in vernier acuit ty and distance discrimination, Doctora 
Dissertation, Stanford Uni 


versity, California, 1947. 


theeye. Casual examination of the table indic: 
influenced by distance, 
significantly ? 

The required sums are also included 
the foregoing formulas gives: 


72[12(7709) — (255 


72[3(26,057) — (255)?] = 1095.50 for between-columns sum of squares 
72[4(18,051) — (255)°] = 598.25 for be 


Subtracting the sum of the last two from the total gives 596.50 as the 
remainder sum of squares. 


ates that acuity measures are 
Do the means for the three distances differ 


in the table. Substituting these in 


)?] = 2290.25 for the total sum of squares 


tween-rows sum of squares 


Table 16.3. Variance table for data of Table 16.2 


Sum of Variance 
Source Squares df Estimate 
Distance 1095.50 2 547.75 
Subjects 598.25 3 199.42 
Remainder 596.50 6 99.42 
eee E 
Total 2290.25 1 


uence of distance we have F = 547.75/99.42 
= 6 іѕ significant at slightly better than the 
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P — .05 level (additional data in Walker's dissertation leave no doubt— 
distance does have an effect). This is a situation in which experimentally 
induced differences are so large that they can be demonstrated with only 
four cases. 


DOUBLE CLASSIFICATION WITH MORE THAN ONE 
SCORE PER CELL 


Suppose that we have m scores in each cell of schematic Table 16.1. 
This would lead to a mean for each cell, and about each such mean we 
would have the variation of m scores. The mean for the rth row would be 
the mean of all mC scores in the row, or the mean of the C cell means of the 
row; the mean of the cth column would be the mean of the mR scores in 
the column, or the mean of the cell means in the column; in the remainder 
term, previously defined as (Хы — ¥,. — X, + X), we would replace Х,, 
by X,. The total sum of squares for all mRC scores would include a 
between-column, a between-row, and a remainder component, plus an 
additional part which would involve the variation within cells about the 
cell means. A convenient label for this new part would be EX(X,, — A5 
in which it is understood that there are m such deviations in each cell. A 
more precise notation would be TEU (Xiro — X4), in which X; is the 


ith score in the cell involving the rth row and the cth column. 


Table 16.4. Variance schema for double classification with rr? scores per cell 


Variance 
Source Sum of Squares df Estimate 
Rows mCE(,. — Xy R=1 5% 
Columns MRE (Kc =.) c-1 52, 
Interaction mEE (Xe —-X.,-X.-XfP (R-1(C-1) Fis 
Within cells EE (Xr — Mey mRC — RC st, 
Total EE (Xe — Xy mRC —1 


The variance table would take on the form indicated in Table 16.4, in 
which the term “remainder” has been replaced by "interaction." Note 
that the first two sums of squares are simply m times the corresponding 
sums for one score per cell, and that the dfs for these sums and for the one 
corresponding to the remainder sum are not changed. The df for the 
within-cells sum depends on the fact that there are m — 1 degrees of 
freedom in each of the RC cells, which gives RC(m — 1) — mRC — RC 
as the df. We now have four estimates, 52,, 8%, 5?,,, s?,,, of variance. 
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This simple modification 


The computational 
A EX and Xy? 


gives XY. Y? as the sum of all the mg 


formulas are: 


Total sum of squares — Re [nRCEXx?. —(ZEx,y] (16.7) 
m 


1 2 
Between-rows squares = E | Y ( ) — (XX e] 16.8) 
М mRC К P Ба ax) í 


Between-columns squares = =t [с> (s x J — (SEY. Я (16.9) 
mRCL Np iu 


Within-cells squares = 1 [mzxx?. 
m 


pa X(Ex,y] (16. 10) 
The interaction sum of squares is Obtained as the remainder when the 
numerical values of the last three are 


Subtracted from the total sum of 
Squares. 


— m) cases. The scores are 
6.6 is a work-sheet layout in 
» Sums of squared scores, and means, 


Total: 39[80(7835) -- (735)2] = 1082.1875. 

Rows: sol2(436® + 2992) L (735)2] = 234.6125. 

Columns: 3,{2(3412 + 3942) _ (735)2] = 35.1125. 

Within cells: vs[20(7835) — (217° + 219? + 1242 + 1752)] = 782.4500 
Interaction: 1082.1875 — (234.6125 + 35.1125 + 782.4500) = 30.0125. 
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Table 16.5. Coded learning scores (sum of scores on 29th and 
30th trials) for Koerth pursuit rotor* 


Practice Sessions 


Rest 
| Interval 5(M T W Th F) 3(M W F) 
9 14 6 10 8 10 11 14 
10 15 10 11 9 1 9 10 
3minutes 14 17 10 1l 9 12 13 14 
lo 7 8 15 12. 13 7 17 
12 8 14 6 9 12 8 15 
2 6 I 9 m 142 9 7 
5 9 2 1 9 611 9 
1 тілше 14 1 E. 6 8 11 12 
14 4 I] 5 9 7 4 10 
6 9 2 35 3 6 7 8 


* Data from Renshaw, M. J., The effects of varied arrangements of practice and rest 
on proficiency in the acquisition of a motor skill, Unpublished Doctor's Dissertation, 
Stanford University, California, 1947. 


Table 16.6. Sums and means for data of Table 16.5 


Practice Session 
Rest 
Interval 5(M T W Th F) 3(M W F) Totals 
UX, =217 EX, = 219 EX, = 436 
3 minutes EX? = 2543 EX? = 2547 EX*, = 5090 
Xa = 10.8500 | ıs = 10.9500 | X, = 10.9000 
ZX. = 124 УХ = 175 £X, = 299 
1 minute УХ? = 1102 EX" yo = 1643 EX? = 2745 
X4 = 6.2000 Xy = 8.7500 P = 7.4750 
EX4 = 341 EX, = 394 SEX, s735 
Totals EX*, = 3645 EX, = 4190 EEX? e = 7835 
X, = 8.5250 Au 9.8500 x = 9.1875 


The interaction sum of squares can also be calculated by direct sub- 
stitution into the definition formula of Table 16.4, which will involve RC 
quantities to be squared, summed, and multiplied by m. We have 

(10.85 — 10.90 — 8.525 + 9.1875)? = (.6125)* 
(10.95 — 10.90 — 9.85 + 9.1875)? = (—.6125)* 
(6.20 — 7.475 — 8.525 + 9.1875? = (—.6125)* 

(8.75 — 7.475 — 9.85 + 9.1875)? = (.6125)? 
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Which when added and 
obtained by subtraction. 
Any reader who is sur 


multiplied by 20 lead to 30.0125, or the value 


term is (2 — DQ — 1) or 1. 
Actually, the easiest way to compute the interaction sum of squares fí ora 
2 by 2 table is to work with the four cell sums of scores. The formula is 
1 2 T 2 
án Xn XX. УХ – ХХ) 
For this problem we have 
95017 + 175 — 219 — 124)? = 4(49)? = 30.0125 


Table 16.7. Analysis of variance for Pursuit learning 
Source S 


um of Squares — qr — variance Estimate 
Rest interval (rows) 


234.6125 1 234.6125 

Sessions (columns) 35.1125 1 35,1125 

Interaction 30.0125 1 30.0125 

Individual differences (within cells) ^ 782 450p 76 10.2954 
Total 


1082.1875 79 


Y small component having to do with errors of 
measurement). 


Next consider the effect on Pursuit learning of varying the rest interval 
and varying the sessions. For sessions we have F, = 35.1125/10.2954 = 
3.41, which is not large enough to lead us to reject the null hypothesis; 
but since nonrejection of the null hypothesis does not prove the hypothesis, 
we can conclude only that the effect, if it exists, is not large enough to be 
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demonstrated by the number of cases used. The between-rows or rest- 
interval effect is highly significant as judged by F, — 234.6125/10.2954 = 
22.79, which is double the F needed for the .001 level of significance. 
Now the fact that the interaction is not significant permits us to conclude 
that the rest-interval effect is similar for five sessions and for three sessions 
per week. If the interaction had been significant, we would need to qualify 
our conclusion about the effect of the rest interval. 


ILLUSTRATIONS OF INTERACTION 


Reference to actual examples of statistically significant interaction may 
help clarify its meaning. For this purpose we shall again use some data 
on visual acuity from the experiment by Walker.* For visual acuity (low 
score, better acuity) by two methods of measurement (depth and vernier) 
with binocular and monocular vision, we have means as given in Table 16.8. 


Table 16.8. Visual acuity: interaction of type of measurement with eyes 
Depth Vernier Total 


Binocular .08 1.07 57 
Monocular 24 1.50 .87 
Total 16 1.28 72 


The marginal means are markedly different, and it is readily seen that the 
cell means (each based on 108 determinations) are not consistent with the 
marginal values. The difference, .24 — .08, is not of the same order as 
the difference, 1.50 — 1.07; or stated in another way, the two differences 
differ from each other. In other words, the amount of difference between 
binocular and monocular acuity depends upon the type of measurement. 

One variable investigated in the experiment was the distance of the 
stimulus from the subject. Since distance is an ordered variable, it is 
possible to picture the interaction by making a graph, with acuity as the 
ordinate and distance along the x axis. Fig. 16.1 shows the relationship 
of acuity (average of the two types of measures) and the three distances 
used. Note the difference between the two curves—the significant inter- 
action for eyes and distance actually means that the two curves are different. 
This lack of parallel behavior of curves is more striking in Fig. 16.2, which 
illustrates the interaction of measures with distance, for binocular and 
monocular combined. In this study there was also a significant variance 
for the subjects by distance interaction, from which we conclude that the 


* Walker, E. L., Factors in vernier acuity and distance discrimination, Unpublished 
Doctor's Dissertation, Stanford University, California, 1947. 
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1.20 240 [ 
| | | 
| | | 
100 200—1—3—3—4—4-—- 
80 1.60 
60 1.20 
40 80 
20 40 | 
D 
Dun 10 15 UP EXE 10 15 
Distance Distance 
Fig. 16.1. Simple interaction: Fig. 16.2. Simple interaction: 
eyes by distance, 


Measures by distance. 


relationship between acuity and distance varies from person to person 
(see Fig. 16.3). 


zi 


0 SS Se | 
10 15 10 20 30 40 
Distance 


Rod width 
Fig. 16.3. Simple int, 


eraction: dis. 
tance by subjects, 


Fig. 16.4. Nonsignificant inter- 
action: aperture by stimulus 
rod width. 
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It is entirely possible for an effect to be in opposite directions for different 
conditions, and the over-all effect need not be significant for this to occur. 

With the concept of interaction in mind, we may revert to a re-examina- 
tion of the s?,, based on the remainder sum of squares when we have К 
persons working under C experimental conditions. The remainder will 
have to do with whether or not the individuals maintain the same relative 
score level from condition to condition, hence it could be labeled as person 
by condition interaction, but a part of what appears to be interaction 
might be attributable to shifting about because of measurement errors. In 
other words, the variance estimate based on the remainder sum of squares 
will be an estimate made up of error of measurement plus real failure of all 
persons to be affected similarly by the experimental conditions. 


CHOICE OF ERROR TERM IN TWO-WAY 
CLASSIFICATION 


Now that we have learned something about the meaning of interaction 
and have had a couple of examples which illustrate the computations 
and the way hypotheses can be tested, we must specifically consider an as 
yet unmentioned question: Which variance estimate is the correct one to 
use as the error term, that is, as the denominator for the F ratio? The 
answer depends on the mathematical model that is appropriate for a given 
situation. Three models have been set forth by the mathematical statis- 
ticians. These are referred to as the random model, the fixed constants 
model, and the mixed model. Let us define these for the two-way classifi- 
cation setup. 

We have the random model when both classifications involve sampling. 
Such would be the case when rows stand for individuals and columns 
stand for judges (each of whom has rated each individual). The individuals 
and the judges are regarded as random samples from normally distributed 
populations: normal distribution of individuals with respect to the ratings 
and normal distribution for the rating characteristics of the judges. 

We have a fixed constants model when no random sampling is involved 
so far as the bases of the classifications are concerned. Such is the case 
when the classifications depend on such things as size, distance, time 
interval, degree of illumination, etc.; or on such unordered things as sense 
modality, sex, method, diagnostic group, etc. The setup in Table 16.5 
involves the fixed constants model; neither the rest intervals nor the 
sessions were chosen at random. 

We have a mixed model when one basis of classification involves sampling 
and the other fixed constants. Table 16.7 illustrates a typical mixed model, 
typical in that one basis of classification is individuals. 
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Each of the three models calls for precisely the same breakdown of the 
sum of squares and of the degrees of freedom, and each leads to three 
variance estimates plus a within-cells estimate in case we have more than 
one score per cell. It should be noted that the within-cells scores can stand 
for two kinds of replication. We might have replication in the sense of 
having carried out the experiment with more than one person in each cell 
om cell to cell) as in Table 16.5, or we might 
res on the same person or persons. Thus in 
? measures per person under each of the C 
€ concerned with replication in the sense of a 
periment by another investigator.) ; 
Actually, for the working statistician the Precise formula for the possible 
s is not nearly so important as the deductions there- 


al variance estimates. Earlier 
(pp. 256-64) we attempted to expla 


estimate, s®,, under nonnull conditions. Perhaps the student should turn 


5?, as either 


or as 
m X(u., — uy? 
ъа t0. 
| G—1 


From now on we will simply write out the expected values as set forth by 
the mathematical statistician. 


The general model for two-way classification may be written as 


rer = H) = x, + Ze + эж +e 


(16.11) 
in which the deviation score fror 


m the over-all population mean is thought 
of in terms of a TOW contribution, %,; à column contribution, «,; an 
interactive effect, ^7; and a normally distributed random (error) part, 
е, The subscript k indicates that we have replication, m scores per cell, 
with k taking on values ],.-.. m, but the m scores in each cell are 
independent of the scores in all other cells. The æ, æ, and оа, are all 


expressed in deviation form, i.e., Possess the property that Xa, = 0, 
Xa, = 0, УУ жж = 0. Actually, а, is used here instead of Hy — p (or 
instead of the more Precise notation, Hm. — p), with a similar meaning 
for ш. The x > %,%, Stands for X = (Uy. — Me — и. + Йй). 

For the fixed constants (sometimes cal 
factor) model, we replace ж by А, thus 


rek 


led fixed effects, sometimes fixed 


(Xr — W) = A, + AL БАА, 6 (16.12) 


For the random (sometimes called components of variance) model we 
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replace « by a, thus 

(Хы — H) = a, + а, + аа, En (16.13) 
and the mixed model can be written (with columns standing for fixed 
constants) as 

(Xa — p) = a, + А, + Ls е (16.14) 

The a,, a,, a,a,, and a,A, are all assumed to be random variates from 
normally distributed populations of effects having variances ati ae ОЗ. 
and o?,,. Note that the lower-case subscripts to a o refer to random 
factors whereas the upper-case subscript refers to a fixed factor. (No such 
distinction is needed for subscripts to s* nor to X.) For the fixed values 
A,, А and A,A, no assumption as to distribution of effects is required. 
Indeed, it is difficult to imagine a distribution of, say A, when C = 2. The 
"population" of effects consists of just two values, A, and Ay, which 
symbols stand for (и. — г) and (и. — и). Two values, or for that matter 
the usual small number of fixed effects, cannot very well be described as to 
distribution, hence the differences among them or their variation about 
an over-all и cannot aptly be described in terms of a o*. Consequently, 
in the sequel the variation among them will be specified in terms of X А 
Likewise, for the A, and the 4,4, we will have 2A, and EX(ALAY, 
respectively. 

When the m scores per cell represent measurement replication, s?,, will 
be taken as an estimate of o?,; when the m scores per cell involve т 
individuals (measured once), s?,, will be regarded as an estimate of indi- 
vidual difference variance, designated о?,. It is to be understood that o?, has 
two components: true score variance and error of measurement variance. 

We are now ready to examine the various possible situations involving 
two-way classification in order to point out just what is being estimated by 
52, s%,, э? and s?,.. Once this is done, we will be in a position to choose 
an appropriate variance estimate, if such is available, as the denominator, 
or error, term for F. The question of variance homogeneity will be dis- 
cussed after a consideration of eight setups (cases) involving two-way 
classification (p. 315). 

Case I. Fixed constants model, with m scores (m persons) per cell, a 
total of mRC individuals: 
mC 


52, —> 0% + —— X A? 
R—17 
2 2 mR 
Se > Oo; + ——XA* 
с бй 
Я т 


УУ (4,4)? 
тетте 
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The general principle in forming an F ratio is to choose two estimates 
which differ (in their expected values) by one term only, the term involving 
the effect being tested. Accordingly, 5*,, is the correct denominator for 
F,, F,, and F, for testing row, column, a 
Note that interaction, if present, has not 
(row or column) effects. 


needed, as we learned in o 
interaction. 


Case II. Fixed constants model, RC individuals, one per cell, with each 


measured m times: the expectations for the first three estimates will be 
precisely the same as in Case I, but now 52, —› 9*.. We see immediately 
that this design leads to difficulties, The resulting 5? estimate is useless; 
if we did use s?, as the denominator for testing, Say, 5°, a significant 73 
i uld not know whether its significance 
r to real individual differences or to a 


Case III. Fixed constants model, only one person measured mtimes under 
each of the RC conditions: if we replace о?, by o2, in the set of expected 
values for Case I we will have indicated w 


t the possible situations for two-way analysis of 
ed constants model. If it has occurred to the 
might be measured under all the RC conditions, 
t this would involve three-way classification, to 
mportant thing to have noted is that clear-cut 
lizations to a population of individuals, are 
of Case I. We have listed the other two cases 
to know what not to do. However, we must 
ion to a sweeping dismissal of Case III: there 
Tceptual) in experimental psychology for which 
experimentally Produced effects are so large relative to individual differ- 
Onably sure that similar significant results will 


be discussed later. The i 
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hold for other persons; sure that is, provided some knowledge of the 
extent of individual differences is available. Rarely will the effects be 
of the same order of magnitude for two persons—individual by conditions 
interaction is the rule rather than the exception. 

Case IV. Random model, rows stand for A individuals and columns 
stand for C judges with m (ordinarily m will not exceed 2) ratings by each 
judge on each individual. The ratings, which must be directed toward 
some trait and involve at least a 10-point scale, might be based on observed, 
or on a transcribed record of, behavior of the R individuals. (The judges 
might find it difficult to rule out memory when making two or more ratings 
for each individual.) Instead of C judges making ratings we might have C 
examiners or testers, each testing the A individuals twice on, say, the 
Rorschach. We have a sample of individuals and a sample of judges (or 
examiners). The expected values of the variance estimates are: 


52, — o?, + mo?,, + mCo?, 
52. > а?, + mo?,, + mRo*, 
52, > 0?, + то?,, 


si, B (3 o, 


It is obvious that s?, can be used as the error term for testing the 
interactive effect, but since s®,, is nothing more than an estimate of error 
of measurement variance for the ratings, the conclusion from a significant 
F is that interaction holds for these particular R individuals and C judges— 
there is no assurance that repetition of the investigation with A other 
individuals and C other judges would lead to interaction. As forthe main 
effect, it is obvious that s?,, becomes the appropriate (and only correct) 
term to use for F, and F,. A significant F, would mean a dependable 
differentiation of individuals over and above the variation due to measure- 
ment error and judge by individual interaction, and a significant F, would 
indicate real variation from judge to judge in a possible population of 
judges. 

Case V. Random model, same as Case IV except that m = 1. No 
estimate of g% is available, but 5°,, would still be the error term for Е; 
and F,. 

Remark: Actually we аге hard put to find good illustrations in psy- 
chology for the random model. Any student who attempts to find other 
illustrations should keep in mind that it must be possible to classify a 
score simultaneously in two different ways, each involving sampling. 

Case VI. Mixed model, rows stand for individuals (or matched persons), 
columns involve C fixed constants (fixed conditions having fixed effects), 
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and measurement replication leading to m scores per cell: 


2 2 
8? 0°, тСо?, 


2 2 2 тк в 

Sigo ы noe до Ууз An. 
C m 

° 2 2 

Sre —> oO + mo^ 

24> оё 


The reader will need to recall that | 
subscripts to a ø? indicate ra 
interaction term can be tested 
conclude that the differenti 


as 
Ower-case and upper-case letters с 
ndom and fixed factors, respectively. T 
by Fre = s? [se 


al responses (failure of the individuals to 


Ors of measurement. Individual by conditions 
nd to be significant. It will be recalled that 1n 
faction term, а, 4, is regarded as a oie 
mes a source of random variation which, if теа á 
will affect the between-columns lerm. We see from the foregoing that 5% 
is the proper error term for testing Se To use 5*, for this purpose 15 
simply not defensible; if, for example, 5°. (52, is significant, it might be 50 


; ы ; г 
umn differences or because of real interaction О 


ч it 
€ measurements are completely unreliable. Indeed. 


Same as Case VI except that т E І (no 
his does not provide an s*,, which is p 
e is again the Proper error term for testing s?,. T a 
under this case, In psychological research, Cas 
VII is used quite frequently; it provides a significance test for the differ- 
ences among a series of C correlated means, correlated for reasons 
Previously specified (p. 295-96). 
Case VIII. Mixed model, R rows stand for R individuals and ош 
stand for C forms of а test (the reliability of measurement setup discusse 
оп pp. 296-97): 


5, o*, + Co? 


R 2 

2 2 У A? 

50°, + с 
я ” Et a 

Sre —> о®, 


It will be recalled that 5*,,, Which was earlier (р. 293) labeled a remainder 
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term, was shown to depend solely on errors of measurement under the 
assumptions usually made in connection with test reliability. We see now 
that these assumptions involve the a priori assumption of no interaction, 
an assumption which implies, among other things, that possible practice 
effects are not different from person to person. Note that in case inter- 
action is operating, 5°, will involve an interaction component (as in Case 
VID); hence s?,, is the appropriate error term, regardless of whether there 
is or is not interaction, for F, as a test of the difference between the C form 
means or over-all practice effects or both (we would not know which). But 
a test of the significance of s?, requires the assumption of no interaction. 

Remark about measurement replication: We have seen that having s?,, 
as an estimate of o?, does not provide us with a useful error term (for F) in 
the testing of hypotheses about main effects (and sometimes about inter- 
action) under any of the three mathematical models. This illustrates a 
general principle: when an estimate of error of measurement variance is 
used as the denominator of F, no generalization to a population of persons 
is possible, and hence no generalization of import to science. This raises 
the question as to whether measurement replication is worth while. The 
answer is yes, particularly when it is known that a single measurement is 
not very reliable. By replicating measurement we will obtain more reliable 
scores in the form of the average of m values; hence one source of vari- 
ability in the data will be reduced. The student who has not noticed that 
the analyses involving measurement replication are, in essence, dealing 
with average scores for individuals should ponder further. 

Homogeneity of variance assumption. For Cases I and II it is 
assumed that individual difference variance is the same from cell to cell. 
For Cases III through VIII it is assumed that error of measurement 
variance is homogeneous from cell to cell. The assumption is testable 
(say, by Bartlett's test, p. 249) only for Cases I, III, IV, and VI. 

An additional assumption, seldom mentioned in textbooks of applied 
statistics, is required for those cases where, for C greater than 2, s?,, is 
used as the error term. This is the assumption of homogeneity of inter- 
action variances, the meaning of which may not be quickly obvious to the 
reader. Let us consider the mixed model, with rows standing for indi- 
viduals and with columns for levels on some factor. The interaction sum 


of squares involves E - 
QC. — X, — X. + X) 


which, it will be recalled from p. 291, was a simplification of the remainder 
QC, — X) (X — 3) + (C. — X] 


from which we see that every deviation being squared and summed to get 
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the interaction sum of Squares is one in which the deviation (X,, — X)is 
twice "adjusted," once for the Tow effect and once for the column (ee 
Suppose that the adjustment has been made for the column effect and tha 
We examine what is left, which is (C. — X) — (X, — X), or simply 
(X, — X,). 

Thus, after adjustment for colum 


. А а 5 
n effect, the interaction sum of square 
is represented by X X (X, — X, 


» Which in turn can be written as 
EQ, =e Xy + У(Х, = ы deris 


basis for estimating the ND 
estimates (with due allowance fo 


action business might be better understood 
egoing breakdown into C parts. Consider oe 
umn, X (Xa — ¥,)?. The Х are variates Wi 


Y Y A a - : 5 
mean X. and the X, are also variates with mean X, hence the difference 
Xa- ¥,) = (а — z.) in deviation units. Then 


У (21 — T, = Es + ХУ - 2023, 
3 


= RS’, + RS}, — 2Rr, ss Se Ss. 
. " » Р 3 e 
in which S*, is the variance (not unbiased estimate) of the scores in th 
first column and S*, is the variance of the distribution of the row means, 
and r, 


А А ; їтїї ч 
241,5 the correlation between these two variates. Similarly for th 
second column and the cth column we have 


RS, + 5°, — Rr, „ Six Sy. 


rat. 


RS? + RS*, — 2Rr, 2 S, Ss. 
and so on to the Cth column. 


We can now tease out two possible sources for the heterogeneity of 
variance for the Subparts entering into the row by column interaction. The 


C components can differ either because the variances in the several columns 
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differ and/or because the degree of correlation between the row means and 
the column scores varies from column to column. When the latter varia- 
tion occurs, it implies that the C(C — 1)/2 intercorrelations among the 
columns are heterogeneous, a heterogeneity that can readily be a by- 
product of the C experimental conditions or the C levels. Presumably, 
violation of the assumption of homogeneity of variances for the com- 
ponents of interaction will have the usual small effect on the F test; 
actually, there is some evidence that such is the case. 
Incidentally, since the foregoing development permits us to write 


ZRS d CRS? —2XRr,,:,54,55,. 
c c 


as an expression for the interaction sum of squares after adjustment for 
differences in column means, we have the basis for additional insight into 
the meaning of interaction for the mixed two-way model. Suppose, for 
argument's sake, that the variance in each column equals the variance of 
the distribution of row means and that there is perfect correlation between 
the scores in each column and the row means. Under these conditions the 
interaction sum of squares is seen to be zero. Next note that as soon as 
these correlations cease to be perfect, the interaction sum of squares ceases 
to be zero. The lower the correlations, the greater the interaction. Now 
since the correlation of the scores in, say, the first column with the row 
means will equal the correlation with the row sums (because the means 
are merely the sums divided by the constant, C), we can write this r, in 
simpler notation, as 


Bos Хаа + My Б ++ + omo) 
pe RSS sum 


the numerator of which tells us that this correlation is a function of the 
extent to which the scores in the first column correlate with the scores 
in the other columns; similarly, for the correlation of any column against 
the marginal means. Thus the interaction is a function of the intercorrela- 
tions among the columns—the lower these are, the greater the interaction. 
When it is recalled that errors of measurement tend to lower correlations, 
it is readily seen that the computed interaction depends in part on measure- 
ment errors, as was specified in the expectations given under Case VI. 

The foregoing argument holds, of course, when the variances within 
the columns are unequal. As an exercise the student might consider the 
situation where the variance in the first column is, say, 4 while that for all 
other columns is 1 and where the correlations are all unity, and thereby 
demonstrate that any differences in column variances also contribute to the 
interaction sum of squares. 
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THREE-WAY CLASSIFICATION 


Suppose that we wish to arrange an investigation so as to let one set of 
data serve to determine whether the variation of a dependent variable 15 
due to or associated with variation on three independent variables. Again, 
the term independent variable is being used in its broad sense. It might be 
a "real" variable like illumination, temperature, amount of food, length 
of rest interval; or it might be a variable having to do with qualitative 
differences, such as kind of food, type of motivation or incentive, various 
psychological sets. It makes no difference whether the variables "i 
manipulatable in the laboratory, as would be true of all those mentioned, 
or whether the desired variation is secured by appropriate choice of cases. 

It is necessary that we be able to assign individuals or scores to each 
combination of groupings made possible by whatever classifications we 
have on the three independent variables. Let us suppose that there are C 
categories on one variable, R on another, and B on a third. For purposes 
of exposition and as a Systematic way of arranging the data, let the C 
categories define C columns, the R categories R rows, and the B categories 


B blocks. Let X,,, represent the score in the rth row, bth block, and cth 
column, and let us assume for the time bein 
for each combination, Thus Худ 


Note in particular how the various sums are specified and their location 
in the table. The first two 


subscripts in E X, indicate that this sum has to 
t row and first block, and that in the summing 

- The general expression for 
all such sums is У Xa The Symbol E. Хау stands for the sum of scores in 
the first column and first block; r takes on values of 1 to R. The corre- 
Sponding general symbol is D Xe In next to the bottom section of the 
table will be found D Xyjasthesum for all the cases in row 1 and column 1, 
the summing being through blocks; i.e., b takes on values from 1 to B. 


The general expression for such sums is D Хе. The sum of all the scores 


in the first block is symbolized as y: X, and in the bth block as X У Хөс: 
y 6€ La 
For the sum of all the sc 


Ores in the fi 
block, we have > x X, and the gener: 


x x Ay, stands for the sum of all sco 


corresponding general expression, 
used to specify the several means. 


rst column, irrespective of row vé 
al expression is X > Хе. The symbo 
r 


res in the first row, and x x Ху is the 


Note also how the “dot” notation à 
The subscript which has been replace 
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Table 16.9. Score and sum schema for three-way classification 
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[ Column 
Sum Mean 
1 с € 
Row Е 
1 Xu Xue Хис У Xue Xn. 
Block! r Ха Xnc Хас = Ха, Xa. 
R Xm Хе Xmc У Хр Xm 
Sum x Xa A Hire P Хас Y X Хде X4. 
Mean Xu Xx Xie X,. Mean block 1 
1 Xm Xe Хус E Хь Xy. 
Blockb r Xm Хос Xo = Хь Xp. 
R Хры XRbve Xmc = Хрь Xm 
Sum $n У Хь, У Xac P P Xue Xo. 
Mean Хы Xy Xiao Xa. Mean block b 
1 Xm Хве Авс EX Xs. 
Block B r Xam Х,ве Х,вс У Х.в Хв. 
К Xnm Херве Хрвс = XRB Хав 
Sum | УХ, ХХв Увс | ХХ Хве Хв. 
Меап Xm X pe X pc Х.в. Mean block В| 
Sums 1 T Xin E Xue E Xwe T X Xue X. 
th 2 b» X Y 
юй, [фы pee pee BP Je 
Ы Хры  pXme PXmc |F E Хве Xr.. 
Sum EE Xm УУ Хь EE Хьс TEE Хы Kew 
Means for 1 Xia Xe Х.с X. Means for 
rows by г Aa Xr Х.с X... TOWS 
амы R Хра Xn. Хис Xp. 
Column means | X.4 X a Хо X... -X 
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SMOu 


COLUMNS 
Fig. 16.5. Geometric picture of three. 


-way classification. 
by a dot indicates the direction of the addition required to obtain the sum 
for the given mean. Thus in X. 


-24 the dot replaces r; this mean is based "a 
R scores, with r running from 1 to R when we sum. The subscripts which 


are left denote that the mean is for scores in the second block and fourth 
column. The total number of means will be as follows: 


RB means of the form Же. 
RC means of the form Ne 
BC means of the form Ms 
R means of the form X. 
B means of the form M. 
C means of the form NA. 


One mean of the form X... = total mean — X 
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placed along the right-left axis for the groups defined by the columns; 
summing down on the side leads to the means along the third axis for the 
groups defined by the blocks. To get any of these means it is, of course, 
assumed that the sum involved is divided by the proper number. 

Of primary interest is the question: Is the variation among the means 
along the edges, considered separately, larger than expected on the basis 
of chance? To answer this we need to break down the sum of squares of 
deviations from the total mean into appropriate components. The score 
Хе in the cubicle defined by the rth row, bth block, and cth column will 
vary more or less from X, and three possible sources of variation for Хе 
are obvious: the deviation of its row mean, its column mean, and its 
block mean from X. Now, if we recall the situation for double classifica- 
tion, it is fairly obvious that, when the score X,,, is considered as belonging 
in row r and column c, one source of variation becomes the remainder 
or interaction for rows and columns; considered next as also falling in 
row r and block b, another source of variation is the possible interaction 
of rows and blocks; and then thought of as belonging to column c and 
block b, the score also involves the interaction of columns and blocks. 

When the sums of squares for these six components are added, it will be 
discovered that they do not sum to the sum of squares for the total; i.e., 
subtracting these six sums from the total sum leaves a remainder. This 
residual is sometimes referred to as error, more frequently as a three-way 
interaction. This term involves rows, blocks, and columns. The reader, 
having in mind the idea that the simple row by column interaction has to do 
with the possible failure of cell entries to be consistent with the two sets of 
marginal means, must now try imagining that the RBC entries in the 
cubical cells of our box may not be entirely consistent with the three sets of 
means on the edges and with the three sets on the surface. We have seen 
that a statistical check on two-way interaction is not possible with only one 
entry per cell; similarly more than one score per cubicle is required for 
testing three-way interaction. 

Table 16.10 gives the essentials, in symbols, for the analysis of variance 
for the triple classification setup. In order to specify the interactions, we 
here adopt the abbreviation scheme generally used. Thus R x B, read 
R by B, indicates the row and block interaction, and R x B x C stands 
for the row by block by column or three-way interaction. In a given 
investigation, the rows, blocks, and columns refer to particular indepen- 
dent or classificatory variables. 

It will be noted in Table 16.10 that the df for the three-way interaction 
term is given as (А — 1)(B — 1)(C — 1). The student may be helped in 
understanding the reasoning which leads to this df by referring again to 
Fig. 16.5. The surface means tend to restrict the deviation score values 
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Table 16.10. Variance table for three-way classification into R rows 
B blocks and C columns 


Variance 
Source Sum of Squares df Estimate 
Rows BCE(X,.. – Ху R-1 P 
Blocks RCE (Х,. — xy B—1 9 
Columns RBU(X. — xp Ge] se 
R x B inter- " 
action CLE (Xp. — X,..— ЖЕ (R — IXB — 1) 5 
R x С inter- А 
action BXXQL-EX + FP (К — IC — 1) Sre 
B x C inter- " 
action REE (Xe — Xn — Kt RP B- XC = 1) ha 
RxBxC EEE (Xe — Rn. — Big — Xu (R — 08 — (€ — 1) oc 
interaction XG dX, x с Xy 
Total Y > У(Х, E22: RBC —1 


eee RN 


"TR А icti 5 
within the Бох. How many cubical cells can we fill before these ur 
Operate? The general rule-of-thumb procedure for determining the df 


: с à iables 
Interaction sums of Squares is to take the product of the dfs of the variabl 
involved in the given interaction, 


8 а 
This holds for two-way, three-way, an 
higher-order interactions. 


SPECIAL CASE WHERE THE 


ROWS STAND FOR 
PERSONS OR MA 


TCHED INDIVIDUALS 


" , -— a 
ose of a Study is to ascertain whether variation on : 
is influenced by or associated with variation on ui | 
i ў A r 

his, of course, involves the double classificatio 


an individual under 


RC cells 
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2. If we are dealing with a situation in which it is required that observa- 
tions be made on the same individual in each of the RC conditions, and if 
more than one case is used either to reduce errors or to provide a basis for 
generalizing to a population, it is necessary that we make statistical allow- 
ance for the fact that the RC observations on the m cases are nonindepen- 
dent, or correlated. This allowance was not possible by the two-way 
classification scheme, for which it was assumed that the m scores in one 
cell were independent of the observations in the other cells. 

It will be recalled that in the two-way classification setup, by letting one 
classification refer to R individuals or sets of matched cases, we were 
provided with an over-all test of significance for several correlated means 
for groups classified on a single independent variable. Triple classification 
permits a similar test of correlated means for groups involved in double 


classification. 
Since the assigning of the bases of classification to rows, blocks, and 


columns is arbitrary, we shall let the A rows stand for R individuals 
(or R matched persons), with the blocks and columns representing the 


independent variables to be investigated. 


COMPUTATIONAL ILLUSTRATION FOR THREE-WAY 
CLASSIFICATION 


The task of computing the required sums of squares (see Table 16.10) is 
tedious. The first step is to arrange the data in some such systematic order 
as that depicted in Table 16.9 and do the necessary adding to secure the 
various sums indicated in that table. The total sum of squares for all RBC 
cases is obtained as usual: sum all the scores, sum all the squared scores, 
and substitute in the general formula (1/RBC)[RBCZX? — (У Х)?]. 

To secure the three between-groups and the three simple interaction sums 
of squares, we form three subtables involving sums taken in various direc- 
tions. For the first of these subtables we take row by column sums 
Obtained by adding cell entries from block to block, i.e., through the В 
blocks. The next to the bottom section of Table 16.9 contains these row 
by column sums, which we reproduce here as Table 16.11a. The reader 
will note that the values for Table 16.112 are the right-hand margin sums of 
Table 16.9 and that the values for Table 16.11c are found as the sums in 
Table 16.9 along the bottom of each block. 

With these auxiliary tables in mind, we can write the required compu- 
tational formulas. The simple interaction terms are secured by computing 
a subtotal sum of squares for each table and then subtracting therefrom the 
two appropriate “‘between” sums of squares. These subtotal sums of 
Squares will not be the same as the total sum of squares obtained for 
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Table 16.11a. Required sums for row by column analysis 


1 c С Ѕит 
1 У Xin = хь. = Хо Tz Хы 
r PEST EX. E Xoc ZEZX. 
R PXm ХХ. Е Хос Z E Хр 
Sum ХЕ хы 2E Хы EZ Xac TEE х, 


1 b B Sum 
= O F um 
1 z Xue E Хь, x Хв, P x Ху, 

d Ха ХХ EX EE Хы 
R IXm. = X pre EZXnm ХУ Хр, 
Бш BE Ха, БУ Хы LEX DEE Хь, 


Table 16.11с. Required sums for block by column analysis 
1 с G Sum 


ULT т LS s 


z Xa Pi Хм, x Xnc > x Ane 

b E Xm > Хы Y Х»с E x Хы 

B ХХ EX», У Х,во ЕУ Х,в, 
Sum Pi Y Xa = x Xm. > р Xac = P X Хыв 


~ 
з 
m 


Subtotal: row by column 


1 2 3 
RBC [Roxy (z х.) – CEPEJ (16.15a) 
Subtotal: row by block 


1 à н 
ЕВС [яв zz (z Xa) = PETEA ] (16.15b) 
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Subtotal: block by column 
1 2 Y 
[aces (zx) - ( ы] | 
zl ke Nr v sie 8180 
From the right-hand margin of either Table 16.11a or 16.115 we can 
compute the sum of squares for 


1 2 2 
Betv З L [nz (22x. (= =X, y] ч 
ween rows TBG У (ХХХ > z E X rve (16.154) 


From the bottom of either Table 16.11a or 16.11с we can obtain the sum of 
squares for 


2 2 
Between columns: "x [cz (ss х.) — (zzzx.) ] (16.152) 
RBC rv robe 


c 
From the bottom of Table 16.115 or from the right-hand margin of Table 
16.11с we can calculate the sum of squares for 


2 2 
Pewen Bis a E x (z X A = (z xx %) ] (16.15f) 
RBC fe 


b rbe 


Then from the above six sums of squares the simple interaction sums of 
squares may be secured by the following subtractions: 


Row by column interaction: (16.15a) — (16.154) — (16.15e) (16.16а) 
Row by block interaction: (16.155) — (16.154) — (16.15/) (16.165) 
Block by column interaction: (16.15с) — (16.15е) — (16.157) (16.16с) 


And finally, again by subtraction, we have the sum of squares for the row 
by column by block, or 


Total sum of squares minus (16.15def) 


Three-way interaction: 
minus (16.16abc). 


We will illustrate the procedure by using the data of Table 16.12, in 
which the blocks represent two levels of illumination, the columns three 
degrees of albedo, and the rows four individuals, and the scores are judged 
whiteness. Notice that each subject made judgments under all six of the 
combinations of conditions. The sums given in Table 16.12 become the 
entries for the auxiliary computational Tables 16.13abc. The needed value 
of xxr X,,, is 898, and the sum of all the squared scores, ЖУУ X3. s 


is 44,394. From these figures we have 


35[24(44,394) — (898)°] = 10,793.83 — total sum of squares 
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Table 16.12. Data used in illustrating computations for three-way classification: 
2 levels of illumination (blocks), 3 albedos (columns), and 4 observers (rows)* 


Albedo 
a ae 
Illumination Observer .07 14 26 Sum Mean 
1 11 24 60 95 31.67 
2 22 26 44 92 30.67 
3 16 22 55 93 31.00 
1.20 4 20 32 82 134 44.67 
Sum 69 104 241 414 34.50 
Mean 17.25 2600 60.25 34.50 
1 14 24 65 103 34.33 
2 27 36 47 110 36.67 
3 18 24 62 104 34.67 
2.00 4 24 59 84 167 55.67 
Sum 83 143 258 484 40.33 
Mean 20.75 3535 6450 40.33 
1 25 48 125 198 33.00 
2 49 62 91 202 33.67 
Sums through 3 34 46 117 197 32.83 
blocks 4 44 91 166 301 50.17 
Sum 152 247 499 898 37.42 
Means for 1 12.50 24.00 62.50 33.00 
rows by 2 24.50 31.00 45.50 33.67 
columns 3 17.00 23.00 5850 32.83 
4 2200 45.50 8300 50.17 
Column means 19.00 30.87 62.38 37.42 


* Data from R. E. Taubman, J. Exp. Psychol., 1945, 35, 235-241. 


The various “between” sums can readily be obtained by adding the 


squares of the appropriate marginal sums of auxiliary Tables 16.13abc, 
and substituting in formulas (16.15qef). 


For between blocks we need (414)? + (484)? = 405,652; 
For between columns we need (152) + (247)? + (499)2 = 333,114; 
For between rows we need (198)? + (202)? + (197)? + (301)? = 209,418. 
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Table 16.13a. Required sums for block by column analysis 


Albedo 
Illumination 07 14 26 Sum 
1.20 69 104 241 414 
2.00 83 143 258 484 
Sum 152 247 499 898 


Table 16.13b. Required sums for row by block analysis 


Individuals 
Illumination 1 2 3 4 Sum 
1.20 95 92 93 134 414 
2.00 103 110 104 167 484 
Sum 198 202 197 301 898 


Table 16.13c. Required sums for row by column analysis 


Albedo 
— 

Individual :07 14 26 Sum 
1 25 48 125 198 

2 49 62 91 202 

3 34 46 117 197 

4 44 91 166 301 
Sum 152 247 499 898 


Then we have 
3,[2(405.652) — (898)°] = 204.17 for between-blocks sum of squares 
for between-columns sum of squares 


2,[3(333,114) — (898] = 8039.08 
4,[4(209,418) — (898)°] = 1302.83 for between-rows sum of squares 
ms of squares we add the squares of the 


In order to secure the subtotal su 
For the block by column subtotal we 


cell entries in the auxiliary tables. 
have from Table 16.13a: 
(69)? + (83? + (104)? + (143)? + (241)? + (258? = 167,560 


Similarly for the row by block subtotal we have from Table 16.132: 


(95 + (103)? + +++ + (167)? = 105,508 


and for the row by column subtotal we have from Table 16.13c: 


(25) + ::: + (44) ++ (166)? = 87,814 
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These three sums can now be substituted into formulas (16.15abc): 


v«4[6(167,560) — (898)?] = 8289.83 = block by column subtotal sum of 


squares 
z4[8(105,508) — (898)*] = 1569.17 = row by block subtotal sum of 
squares 
z4[12(87,814) — (898)°] = 10,306.83 = row by column subtotal sum of 
squares 


Next we get the simple interaction sum of squares by the subtractions 
indicated in formulas (16.16abc): 


8289.83 — 204.17 — 8039.08 — 46.58 — block by column 
interaction 
1569.17 — 204.17 — 1302.83 — 62.17 — row by block interaction 
10,306.83 — 8039.08 — 1302.83 — 964.92 — row by column 


interaction 
Then for the three-way interaction sum of Squares we have 


10,793.83 — 204.17 — 8039.08 — 1302.83 
— 46.58 — 62.17 — 964.92 — 174.08 


The several sums of Squares, their dfs, and the resulting variance 
estimates are brought together in Table 16.14. 


Table 16.14. Analysis of variance for judged whiteness by 4 observers for 3 
degrees of albedo and 2 levels of illumination 


Sum of Variance 
Source S 


quares df Estimate 


Illumination 


204.17 1 204.17 
Albedo 8,039.08 2 4,019.54 
Subjects (individual differences) 1,302.83 3 434.28 
Interaction: / x 4 46.58 2 23.29 
Interaction: / x S 62.17 3 20.72 
Interaction: A x S 964.92 6 160.82 
Interaction: I x A x 5 174.08 6 29.01 

Total 10,793.83 23 
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First we use the three-way interaction as a basis for testing the signifi- 
cance of the simple interactions. Of chief interest in this example is the 
possible interaction. between albedo and illumination, but since this 
interaction variance is less than that for three-way interaction, we know at 
once without computing F that the interaction is insignificant. The 
illumination by individual interaction is also insignificant. The interaction 
of albedo with individuals yields an F of 160.82/29.01 — 5.54, which, for 
n, = 6 and n, = 6, falls between the values of 4.28 and 8.47 for the .05 and 
.01 levels respectively. This F of 5.54 is high enough to suggest that the 
form of the relationship between judged whiteness and albedo varies 
somewhat from person to person. 

Now we turn to a test of the main effects. A test of the significance of 
row differences is a test of individual differences and is accordingly of little 
interest. For illumination we have F = 204.17/20.72 = 9.85, which falls 
near the 10.13 required for P — .05, and is therefore suggestive of a real 
difference due to illumination. For albedo we have F — 4019.54/160.82 
— 24.99, which is highly significant. 

Actually, the foregoing results are not to be regarded as conclusive. 
The data which we have used to illustrate the computations are only a part 
of more complete data which involved additional degrees of albedo and 
other levels of illumination. Partly because of space limitations and partly 
because it is easier to illustrate the computations when only a few rows, 
columns, and blocks are involved, we have ignored a part of the available 
data. 

It should be kept in mind that this illustration is an example of the use of 
the three-way classification scheme as a method for making allowance for 
the use of correlated observations in a problem of double classification 
involving the influence of two variables on a third. In this special use of 
three-way classification, in which the rows correspond to individuals, the 
Objective is identical with that in the earlier analysis of pursuit rotor 
learning (Table 16.7). The two situations are similar in that there are m 


y are different in that the m scores in any one 


(or R) scores in each cell; the t л 
cell for the pursuit learning problem are independent of the m scores In 
other cells, whereas the R scores in each of the albedo-illumination cells 


are correlated—each person contributes a score to each cell. Both schemes 
permit a check on the interaction effect of the two independent variables 
used to classify the observations. The use of BC observations on each 
of R cases (if feasible) will yield more precise information than obtainable 
by having scores for m individuals in each of the BC cells. This is analo- 
Bous to the well-known principle that experimentation in which individuals 
serve as their own controls tends to be more precise than that in which an 


independent control group is set up. 
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THREE-WAY CLASSIFICATION WITH m CASES PER 
CUBICLE 


We have seen how the possible association of a dependent variable with 
three independent variables can be tested by a variance analysis made on a 
triple classification basis. If we wish either to base our results on 11076 
than АВС observations ог to test the significance of the three-way inter- 
action, it is necessary to have more than one score in each cubicle. This 
can be accomplished either by assigning m individuals to each of the RBC 
combinations of conditions or by using just m individuals with each 
yielding an observation under all the RBC conditions or by using m sets of 
RBC cases with one individual of each set assigned to each of the RBC 
groups. Matching may not be feasible; neither may the securing of RBC 
observations on each of m individuals be feasible. At times, however, the 
problem under consideration may require an observation on each individ- 
ual under all the conditions. Whether m individuals are so used by prefer- 
ence or by necessity, we will have m measurements in each of the RBC 
cubicles, but in testing the significance of the differences between the 
means of rows or of columns or of blocks we will be dealing with a situa- 


tion in which the means are correlated because they are based upon the 


same individuals. To allow for this fact we would need a four-way classifica- 
tion setup. 


Let us next consider the case in which we have in each cubicle m scores, 
which are independent of the m Scores in other cubicles. The total number 
of scores will, of course, be mRBC, and the breakdown of the total sum 
of squares will include the components specified in Table 16.10 plus а 
within-cubicles sum of Squares. Since each cubicle defines a group, the 


within-cubicles sum of Squares does not differ from previously discussed 
eere 
within" sums of squares. The formula in this case is 


1 2 
„БУХ, — Xx, t 


in which it is understood that the xx? 
that the subtractive term indicates that 
for each cubicle, then Square each of t 
RBC squared sums. 


term contains mRBC squares and 
we first sum the m scores separately 
hese sums, and finally sum all these 
The df for this term will be mRBC — RBC because 
we are dealing with the deviations of mRBC scores about RBC different 
means. 

With m independent scores per cubicle, the six computational formulas 
(16.15) need only be modified by the use of I/mRBC instead of 1/RBC aS 
the factor outside the brackets. It must be understood, however, that the 
sums within the parentheses of formulas (16.15) will involve m times as many 
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scores as for the simpler situation with one case per cubicle. The com- 
putation is again accomplished by auxiliary tables, the main cell entries of 
which will, of course, also involve sums with m times as many scores. If 
we think of the orderly arrangement of the original data, as exemplified in 
Table 16.9, it will be seen that each cell in the separate block designations 
will consist of m score entries; i.e, We will have т scores of the type 
Ху ог X344. A more precise notation would be to let АХ, stand for the 
score of the ith person in the rth row and cth column of the bth block, with 
i taking on values of 1, 2, + + т. 

Except for the use of 1/mRBC in place of 1/RBC in formulas (16.15), 
the computation of the between and simple interaction sums of squares 
follows exactly the steps outlined for a single score per cubicle. The 
three-way interaction sum of squares is again obtained by subtraction, but 
now we must also deduct the within-cubicles sum of squares. Note that 
in the formula of Table 16.10 which defines the three-way interaction term 
we need to replace Х by Хе, the mean of the m scores in the rth row and 


cth column of block 5. 


CHOICE OF ERROR TERM IN THREE-WAY 
CLASSIFICATION 


The general mathematical model for the breakdown of a score in the 


three-way classification setup may be written as 
Qua Ю m edo My Fla H 0,0 H Arke H ye + 9000 + ervek 
in which the subscripts, 7. b, and c refer to rows, blocks and columns, and 
k takes on values 1 -+ m, there being m independent replications (either 
of measurement or of individuals) in each cell. The mean value of each 
term on the right of the equality sign is zero; that is, all values are expressed 
in deviation units. Note the manner in which the interactive effects are 
designated—«,o., is to be read as row by block interaction. Using notation 
like that employed in specifying equations (16.12-16.14) from equation 
(16.11) for two-way classification, we may replace the alphas by As to 
represent fixed values (fixed constants model) and by as for classifications 
involving samplings (random model). The mixed model would, of course, 
contain one lower-case and two capital letters or two lower-case and one 
capital. 
Rather than rewrite the model equ 
particular models, we can indicate th 
[4,4,A.] for fixed constants model 
[a,a,a.] for the random model 
[а,4,4.] and [a,a,A,] for mixed models 


ation with Latin letters specifying the 
e models by the following symbols: 
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It is assumed that the a,, a, a, аа„ аа, аа, а,А, d, Ao “ate 
а,ауа,, a,a,A,, a,A,A,, and e,,, are random variates from norma 
distributed populations of effects having the respective variances: hy p С 
ay 6*5, s rey OB 0°, с, 0°, с, Ores о? ас, 0°, всу and c e when 
k = 1:7 m represents measurement replication or o?, when k = 1 ++- m 
involves replication of individuals. There is seldom, if ever, an oppor- 
tunity to check on the normality of the several interactive effects—a fact 
which may be disturbing to the reader. No such assumptions are made 
regarding the effects A,, A,, A., A,A,, 4,A., A,A,, and 4,4,4,, which are 
associated with the fixed constants. Since all effects are expressed in terms 


of deviation units, the sum of each particular set of effects, such as a, or A, 
or a,A, or A,A,, is zero; that is, e.g., x Ха,А, = 0. 


In order to choose the appropriate variance estimate for the denominator 
of F for a given significance test, we again need to indicate just what each 
possible variance estimate (s?) estimates under nonnull conditions. A 


summary statement will be given later regarding the assumption of homo- 
geneity of variance for the several ca 


ses involving three-way classification 
(p. 337). 

Case IX. Fixed constants model 
in each of the RBC cubicles. This is 
5, > о?, and all the other seven s2 
(possible) effect, the one to be teste 


[44,4], with m different individuals 
a simple, straightforward case in which 


values are estimates of o?, plus a single 
d. Examples: 


and 


s, — 0%, + mC ‘AA 2 
И (R-1(—1 22044) 


Thus s?, is the proper error term for te: 


two-way interactions, and the three-way interaction. Generalizations are 


the population(s) from which the mRBC persons were drawn, but 
conclusions regarding main effe 


spoken of as factors), will nee. 


sting all three main effects, all three 


interaction is zero, but such an 
assumption in psychological research is hazardous. 


person per cubicle but each person 
w Which is an estimate of o?, rather 


than the needed estimate of o? Now it might be thought that this 5% 
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could be used to test s?,,, for the presence of three-way interaction, but 


note that since 
2 1g m 2 = 
е0 (к та DB ZIC = 1) ZXE(AAA) 
апа 5, — o?, the division of 5° by sj, leads to a noninterpretable 
F (if significant) because there is no way of knowing whether the signifi- 
cance is due to individual differences (remember that о?, contains an error 
-way interaction. Stated differently, the 
s?» is an estimate in which error of measurement variance, true individual 
difference variance, and possible three-way interaction effects are all 
confounded, a term used to indicate that a given setup does not allow a 
disentangling of the sources of variation which enter into a particular 


estimate. 
Case XI. Fixed constants model, with only one person supplying all 


scores, i.e., a score (or scores) under each of the RBC combinations of 
conditions. If we have m measures on the one person under each of the 
RBC conditions, 52, — 0°, and each of the other seven variance estimates 
has an expected value including 0°, plus an effect. A significant F with 5%, 
as the error term permits only the conclusion that repetition of the experi- 
ment on this same person would be expected to yield similar results—a 
“generalization” which has no generality, and hence is worthless. 


Case XII. Mixed model [4,44] Typically, this will involve R 
individuals assigned to the rows with each measured at least once under 


the BC conditions. We have (with no measurement replication): 
52, 0, + BCO, 
RC 


s R 8 T Cog T В—1 2A, 


of measurement part) or to three 


RB 
52,0, + Boc + cpi ua 
5%ъ +o, + Со*в 
Seit Borg 
R 
a а аа PRG 
She o, + вс + {Б тус - D? = ( 4.) 


2 2 2 
Зв Oe + 0 BC 


Scrutiny of the foregoing expected values indicates that s*,,, is appro- 
priate for testing the B x C interaction, that s?, should be tested against 
58, and 5%, against s?» No test for 5 is possible, but this is not serious 
since it would only be a test of the significance of individual differences. 
Nor is there a test for 52, and 5°, the two interaction terms having to do 
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with individual differences in reaction to the defined experimental condi- 
tions. We would need an estimate of о°, for this purpose; ordinarily, 
such individual by condition interactions are real. 

Case XIII. Mixed model [a,a,A,], with one score per cell. Researches 
calling for this model in psychology are not plentiful. Suppose R children 
are observed under C different social conditions by B observers, each of 
whom rates (on a 10-point scale) each child in each of the situations for a 
particular aspect of behavior, e.g., social participation. Primary interest 
would be in the effect of the conditions (the A, effects) with secondary 
interest in observer bias (the raters being regarded as a sample of ob- 
servers having a, "'effects") and possible interest in two-way interaction 


effects. For model [a,44,] the meaning of the several variance estimates 
is as follows: 


5 08, + Соё, + BCo?, 
8,07, + Co?, + RCo’, 


2 2 RB 

FACUPu aO gap е 154", 
—1c 

2 2 2 

5—0, Ca, 

2 2 2 

Sto Fe + Onc + Borg 

2 2 2 

Ste Fe + Ong + Ко, 

2 2 

ње 0^, + о? о 


Wer we examine the foregoing expected values, we see that both s?,, 

апа 5 are testable against s*.,. as the denominator for the Fs and that 
E : : е 

5", and s*, can Бе tested against 5°,,, but 5?,, itself is not testable. The 

great difficulty is that the main effect of primary interest, the A, effect, is 

not amenable to test unless we can assume either o? „ого, o (or both) to be 


2 д 

Zero. Ifo? с; were zero, 5°, could be tested against s*,.; if o? o were zero 

we could use 5? to test 52; ifb p foi ; 
те St 5*.; if both Werezero, wecould uses? for testing the 

main effect variance, s. the assumption 


that either of these two t in fact, the safest 


» Will be discussed later under the heading “Рге- 


Suffice it to say that model [a,a,A,] is not 
recommended. 


(For the situation involving R children апа p observers with main 
interest in the effect of the C conditions, we can simply sum or average 
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the B possible ratings for each child for each of the C conditions, then use 
these sums or averages as the scores in a two-way mixed model setup with 
R rows and C columns, which leads to a straightforward test of the effect 
of the C experimental conditions.) 

Case XIV. Random model [а,а,а,]. If anyone should find a situation in 
which all three bases of classification involve sampling, he will need to 
know that the two-way interactions can be tested against s?,,, but that there 
is no way of testing the main effects without making untenable assumptions 
regarding two-way interactions. This sad state of affairs is not too sad 
simply because experimentation involving the random model, three-way 
classification, is hard to come by. 

Case XV. Mixed model [a,A,A,], but a pseudo three-way classification. 
Suppose a sample of A individuals in block 1, a sample of A different 
individuals in block 2, and so on. The B blocks represent B experimental 
conditions, or B levels for a factor, the effects of which are to be deter- 
mined, and at the same time the C columns stand for another factor which 
is also to be evaluated. The B sets of R individuals are used because it is 
not feasible to use each person under each block condition. Or suppose 
the blocks stand for different groups (say, diagnostic) from each of which 
R cases are drawn at random. We wish to compare the groups and also 
the C conditions and perhaps the B x C interaction. This setup is often 
referred to as the “‘split-plot design," the plot concept coming from agri- 
cultural experimentation. More recently, this design is said to involve 
"nesting"—one group of A persons are nested in one block, another set 
of R persons are nested in a second block, and so on, with never a move 
from nest to nest. 

Let us re-examine Table 16.9 in order to determine how to set up the 
model for this situation. We first note that for Case XII the variation 
among the row means (Х,.) contributes to 5°, as an estimate of individual 
difference variation, whereas for Case XV each of these row means is an 
average for B different individuals; hence row means do not hold for 
individuals. We do, however, have individual difference variation within 
each block, as represented by means of the type Y,,. (right-hand part of 
Table 16.9). Accordingly, we can anticipate a sum of squares for individual 
differences which will involve combining the sums of squares within each 
block; ie, СУУ (Ж, — XY. with RB — B degrees of freedom. The 
resulting variance estimate may be labeled 5*, for individual differences, 

In ordinary three-way classification (Case XII) the B sets of means of 
the type X». have to do with row (individual) by block interaction, an 
iss peti Зе Ый RB MU RM 

3 with independent cases in each 


block, no block by row interaction is possible: a person cannot react 
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differently from one block to another unless he has been measured under 
more than one block condition. Consider next the Y,., type of mean at 
the bottom of Table 16.9. These means ordinarily enter into row by column 
interaction, but in the present case each of these means is the average for В 
different individuals who just happened to have been assigned the same 
row number. Therefore, there can be no row by column interaction in 
the usual sense. We have, nevertheless, RB independent individuals in a 
total of RB (instead of R) rows; hence there could be a meaningful 
individual by column interactive effect (not testable with one score per cell, 
but present as a source of variation). 


Table 16.15. Modification of variance Table 16.10 for case XV: R different and 
independent individuals in each block 


Variance 

Source Sum of Squares df Estimate 
Individuals* CEZ(X, — Xy RB—B 5°, 
Blocks RCE (Xs, — Xy B—1 8f, 
Columns КВУ (X... — xy C-1 S 


B x Cinter- 
action 


rbe — Ар. — Xy * X4 B(R —1)(c — 1) 5% 


Xa. — Xy RBC —1 


REX( Ret XP (B-IXC-) з 
Remaindr — EZXx(y, 

robe 
Total = x z( 


* The sum of squares for individuals is computed by substituting in 


JA : 
ac 2G Xn)? — ESE xay. 


react inconsistently from o 
been subjected to different 

With the foregoin 
for Case XV: 


3 block conditions. 
Б in mind, we may write the following specific model 


(X, — u) = a+ A, AL АА + ha 
in which a; indicates individual d 
after the first four parts have been s 
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remainder in Table 16.15 involves possible individual by column inter- 
action, composed of ordinary row by column interaction within each block, 
then summed over blocks. 

The expected values of the several variance estimates are as follows 
(recall that o?; contains o?, as a component): 


5°, o? 
RC 
M oculi xA 
So Е 5 b 
2 RB 2 
Tete, + Fu MES 


2 2 2 R 2 
Se Te + Ore + B-C paa AA) 
0°, + ос 

From these values we see at a glance that 5°, is the error term for testing 
52, a test which is analogous to s?,/5?,, in the one-way classification setup 
for the difference between means of independent groups. For testing 
52, the remainder estimate, s®,, is appropriate. Since 5°, is, in part, an 
estimate of individual by column interaction, we find an analogue in the 
two-factor setup (Case VI, p. 313) for which the row by column interaction 
provides the correct variance estimate for testing column effects when the 
column means are correlated (based on the same or related or matched 
individuals). 

The remainder variance estimate is also appropriate for testing the 
B x C interaction. Note that this interaction involves C means in each 
block that are independent of the C means in every other block but at the 
same time the C means within each block are not independent of each 
other. This interaction has a special meaning when B stands for different 
groups and C stands for C tests all scored in comparable standard score 
form. The column means for each block are the basis for a given group's 
profile; hence a test of the B x C interaction tells us whether there are 
significant differences among the profiles for the B groups. 

Caution: Case XV as here outlined calls for the same number of 
individuals per block (or group). 

Assumption of homogeneity of variance. Cases IX, X, and XI require 
similar variances for all cubicles, but only Case IX permits a test of the 
assumption. For Cases XI, XII, XIII, and XIV it is assumed that error of 
measurement variance is the same from cubicle to cubicle. The assumption 
for these cases is not testable unless we have measurement replication with 
m scores per cubicle. Case XV assumés that the row variance within 
blocks is homogeneous from block to block when 5°, is used to test s*,, and 
that the row by column interaction within blocks is similar from block to 
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block when either s?, ог s?,, is tested against s?,. Both of these assumptions 
are testable since the required within-blocks estimates are computable. 


PRELIMINARY TESTS AND POOLING 


When we discussed Cases XIII and XIV, we found that certain effects 
could not be tested without assuming that an interaction is zero. The 
temptation is to assume an interaction is zero if it fails to be significant 
when tested against an appropriate error term. The writers of textbooks 
on mathematical statistics are remarkably mum on this point, presumably 
because the situation gets too “iffy”: a main effect is significant if it 
reaches, say, the .05 level, and ifa certain interaction was not significant at 
a specified level. Under such circumstances a P for an effect ceases to have 
the same meaning as when unencumbered by conditional probabilities. 

Note that preliminary tests may have to do with the assumption of zero 
interaction in the numerator term of F (as for Case XIII) or in the denomi- 
nator term (as for Cases II and X). Failure to satisfy 
zero interaction in the numerator will lead to too 
Stated differently, significance for a main effect ca 
because the numerator involves a possible confoun 
main effects. Failure to Satisfy an assumption of 
denominator term will lead to too few significant 
obtained F possesses greater significance than its 

Preliminary tests are also used in connection wi 
of squares and of their dfs. To understand the meaning of pooling, let us 
consider Case IX in which all effects are testable against 52,. The advo- 
cated steps are: first, 5°, is tested against s?„. If this F is not significant 
at, say, the .05 level, the sum of Squares for the three-way interaction term 
is combined with that of s*,,, with the dfs also being summed. Dividing the 

d df gives another estimate of variance for the 


the assumption of a 
many "'significant" Fs. 
nnot be safely claimed 
ding of interactive and 
Zero interaction in the 
Fs, which means that an 
P indicates. 

th the “pooling” of sums 


Way interactions, 
which if insignificant provide additional sums of squares and dfs for 
adding to the pool already made up. 

The claimed advanta 


freedom for the denominator, or error 


gain in df does not have an appreciable effect, in the sense that a smaller F 
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is required for significance, except when л, is very small, say less than 8 or 
10. It should be clearly noted that the gain in df by pooling does not lead 
to a reduction in the sampling errors of the means being tested. 

The use of preliminary tests as a basis for pooling is not nearly so 
defensible as textbooks written prior to 1951 would have us believe. The 
work of Paull} indicates that the usually advocated rule (that when Е 15 less 
than the value required for the .05 level, pooling is permissible and advis- 
able) is far from satisfactory. He sets up an elaborate set of rules leading 
to the decision *never pool" or "sometimes pool" or "always pool." 
Space does not permit an exposition of his rules here. A simple rule to 
follow when the dfs are equal, or when unequal provided both are greater 
than 6, is to pool only when Fis less than 2. Even when we follow the rules, 
Fs based on pooling do not lead to Ps of precisely the same meaning as Ps 
obtained from Fs which do not involve pooling. 


HIGHER-ORDER CLASSIFICATION 


There are times when it is both desirable and feasible to study the 
variations of a dependent variable associated with variations in more than 
three variables. For such a study the data are classifiable in more than three 
ways. We have already mentioned the setup in which an observation is 
made on each of m individuals under each of the combinations of condi- 
tions defined by rows, blocks, and columns. There will be RBC scores for 
each individual, and the scores may be classified not only as belonging to a 
given row and a specified column of a particular block but also as belong- 
ing to a certain individual. Although it is easy to make an orderly 
arrangement of the data for quadruple classification, the required compu- 
tations become somewhat burdensome. For the situation involving a 
fourth classification, based on either individuals or on a fourth independent 
variable, there will be 16 sums of squares: 1 for total, 4 for between groups, 
6 for two-way interactions, 4 for three-way interactions, and 1 for four-way 
interaction. When five classifications are used we will have sums of squares 
for the total, 5 betweens, 10 simple interactions, 10 triple interactions, 
5 quadruple interactions, and 1 fifth-order interaction. It is not within the 
scope of this book to outline the computations for these higher-order 
classifications. ł 

The possibilities of the variance technique as a method of extracting 
from one set of data information regarding not only primary effects but 


1 Paull, A. E., On a preliminary test for pooling mean squares in the analysis of vari- 
ance, Annals math. Stat., 1950, 21, 539-556. 

t See Edwards, A. L., and Horst, P., The calculation of sums of squares for inter- 
action in the analysis of variance, Psychometrika, 1950, 15, 17-24. 
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also interactions have, at times, led to rather indiscriminate inclusions of 
variables. For instance, a classification of subjects as male or female may 
be made in order to determine possible sex differences. Since the typical 
experiment for which the variance technique is used is likely to be based on 
a relatively small number of subjects, it is very doubtful whether any 
information of value will be added to the sum total of the already incon- 
sistent findings concerning sex differences. 

Those who carry out studies involving more than three-way classification 
encounter great difficulty in interpreting significant higher-order inter- 
actions. Some have thought it safe after ascertaining the sums of squares 
for the primaries and the two-way and three-way interactions, to use the 
remainder variance, which is a composite of untested higher-order inter- 
actions, as an error term. Such a practice assumes insignificance for the 
interactions whose sums of squares are thus allowed to combine, but since 
there are instances of significant four-way interaction, the cautious 
investigator will extract and test all the possible interactions before using 
such a remainder as the error term for F. 

As a matter of fact, the choice of the proper error term for higher-order 
classifications is, at times, quite complicated. For the simple four-way 
setup involving the fixed constants model, with m replications of individ- 
uals per cubicle, the s,, estimate is the correct error ter: 
main effects and all 11 interactions. For the mixed four- 
standing for individuals (a typical setup), the main effects for the three fixed 
constants factors are tested against the respective two-way interactions 
involving individuals, the three possible two-way interactions among the 
three fixed factors are testable against the appropriate three-way inter- 
actions involving individuals, and the three-way interaction for the three 
fixed factors can be tested against the four-way interaction. No inter- 
actions involving persons can be tested nor can the main individual differ- 
ence effect be tested. If anyone cooks up a research calling for the mixed 
model with two random and two fixed constants factors, he should be told 
that the three variances of principal interest (the two main effects for the 
fixed factors and their interaction) are not testable in any exact way. 


m for testing all 4 
way model with a, 


FACTORIAL AND LATIN SQUARE DESIGNS 


The student who encounters the term 
know that it is difficult to make a distincti 
the analysis of variance setups discussed in this chapter. The bases for 
classification are referred to as factors; the categories within a classifica- 


tion are termed "levels." Perhaps the term factorial design is inappro- 
priate when one basis for classification is persons. 


“factorial design” will need to 
on between factorial design and 
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The Latin square design had its origins in agricultural experimentation. 
If T different treatments (fertilizers) are to be evaluated, a plot of land is 
laid off into T rows and T columns and the treatments are so assigned that 
each treatment occurs only once in each row and only once in each column. 
With Latin letters standing for the treatments, there might be the accom- 
panying square, an examination of which reveals that this is a scheme for 


Columns 
i 2 8$ 4 
I4 DBE 
R п B 4 C D 
WS YI е X DA 


balancing out the effects of possible fertility differentials from row to row 
and also from column to column. 

Some researchers in psychology have used the Latin square principle 
as a way of balancing the effect of individual differences and order of 
testing. That is, with T conditions to be evaluated, the rows stand for T 
individuals and the columns for T orders of testing, with Latin letters 
representing the T conditions. The design also can be and has been used in 
lieu of a complete three-way factorial design when all three factors involve 
the same number of levels. For example, sixteen properly arranged obser- 
vations may be used instead of the sixty-four observations required for a 
complete three-way classification plan with four levels per classification. 
This second use of the Latin square principle is not for the purpose of 
balancing out the effect of a factor but rather for evaluating the effect of 
factors which are deliberately varied. 

Thus, it would seem that the Latin square design might be very useful 
in psychology, but before we accept it uncritically (as some advocates 
have), we need to examine the underlying mathematical model, which may 
be written as 

(Xa — p) = x, o o + n + fret 


The as refer to row, column, and treatment effects, and f,,, is a remain- 
der, or residual. It follows from the model that the breakdown of the total 
sum of squares and degrees of freedom will lead to sums of squares for 
rows, for columns, and for treatments, each with T — 1 degrees of freedom. 
These sums of squares will use up 3T — 3 of the total df, 7? — 1; hence 
there remain 7? — ЗТ + 2 degrees of freedom for the residual sum of 
squares, which provides the error term for testing s?,, 5°,, and s?,. 

When the foregoing model is compared with that for the complete 
three-way classification (p. 331), we see a marked difference: the absence 
of interaction terms. For the Latin square design it is assumed that all 
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interactions are zero. This assumption is necessary for three (not neces- 
sarily independent) reasons: (1) there are not enough degrees of freedom 
available for taking out possible interactions, (2) the main effects are 
confounded with interactions, and (3) the residual simply does not provide 
an error term appropriate for testing any of the three main effects. These 
considerations can be made more explicit by examining the expected values 
of the several variance estimates. 

For the fixed effects model [4, 4,4,]. which may more aptly be specified 
as [4,4,4,] since there are Т levels for each of three factors rather than 
just T “treatments” for the one factor designated by blocks, the expected 
mean squares are (aside from a common 95): 


roeg (ЕММИ BEAM ro 
eas peg mor (roar Ттт? 
прет 2} ХХХ (4,4,4? x У (4,4, т . 
=н) Toy “т-у re^ 
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These expected values are for the Latin square design used in lieu of a 
complete three-way layout, fixed effects model, where the interest is in 
testing all the main effects, that is, the effect of the factor assigned to rows, 
of that assigned to columns, and of that assigned to blocks. We see 
immediately that the possible presence of interactions snarls the obtaining of 
à valid F ratio for any of the three main effects. This sad state of affairs can, 

‚ be avoided by using the regular three-way classification design— 
more work to be sure, but the rewards are twofold: main effects are 
readily testable and interactions also can be extracted and tested. 

Table 16.16 has been prepared for the reader who is puzzled by the 
manner in which main effects are confounded with interaction effects in the 
foregoing expected values and who is also curious as to how to proceed to 
set up a Latin square in lieu of a complete three-way fixed constants 
design. In this table we presume, for purposes of illustration, that all 


scores in blocks 4, B, and C are Population means. All row means, all 


column means, and all block means are equal to 4; that is, there are no 


main effects at all. In each block there is the same row by column 
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interaction; thus summing through blocks and dividing by 3 will yield 
means that will show row by column interaction, but since this interaction 
is exactly the same within each block there is no three-way interaction. 
The boldface numerals in the three blocks are the "scores" for the 3 x 3 
Latin square to the right. Each of these boldface values enters the Latin 
square with its row and column designation intact and with its block 
source designated by A or B or C. For the Latin square so generated, it 
will be seen that the row means are all 4; ditto, the column means. But for 


Table 16.16. A Latin square generated from a three-way layout 
Blocks A B C Square 


I 4 4 4/4 4 
Il 2 4 6/2 4 
ш 6 4 2|6 4 


the block effect we have from the Latin square the following means: 

X4 = (4 + 6 + 43 = 4.67 

Xp = (4 + 2 + 2)/3 = 2.67 

Хе = (4 + 4 + 6)/3 = 4.67 
which are illusory as indications of a main effect because the effect was 
produced by the row by column interaction—no block differences held for 
the starting three-way situation. Compare this outcome with the expected 
value for s?, and note that the row by column interaction is not involved 
in the expected values for s?, and s?.. 

For the second, more common use in psychology of the Latin square, 
with rows standing for persons (animals), columns for order or sequence 
in testing, and Latin letters for experimental conditions (treatments), we 
have a mixed model [a,4,4,] with rows as random variates. The expected 
mean squares are (again omitting the common о?,): 
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The primary interest is in testing 5?,, but we see no suitable error term 
unless it can be assumed that the order by treatment interaction is zero. 
Such an assumption is equivalent to saying that the influence of the order 
A, D, B, C (see Latin square on p. 341) is the same as the influence of the 
order B, A, C, D; and so on. Whatever the order effect, whether it be 
practice, fatigue, boredom, something physiological, change in mental 
set, etc., it must be assumed that any such effects or combination thereof 
are independent of particular treatments. If, for example, treatments were 
various drugs, differences in residual effects would lead to order by 
treatment interaction. 

The reader will have noted that when F is taken as 5? [s?..., the presence 
of order by treatment interaction will mitigate against getting a signi- 
ficant F, and that if F reaches the « level of significance he can claim 
significance at better than the « level, though how much better remains 
unknown. The reader will have also noted that for a (typically) small 
number of treatments, a single Latin Square design uses so few cases 
(T in number) that sampling errors will tend to be very large. The advan- 
tages of larger N can be attained by replication—additional sets of T 
persons provide additional Latin Squares, for a discussion of which 
the reader is referred to Cochran and Cox.$ And, finally, the reader 
may not have noted that the presence, in the expectations for both s?, 
and s?.., of the three interactions involving rows indicates that the 


То?, component must be relatively sizable in order to lead to an appreci- 
able F. 


§ Cochran, W. G. and Cox, G. M. Ex, wrimental desi А 
Wiley, 1957. P signs, 2nd ed., New York: John 
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SELECTED COMPARISONS 


As in one-way classification, when F indicates that a main effect is 
significant, we may proceed to test specific contrasts among the means. 
The reasons for doing this and the general procedures are the same as those 
set forth in the last section of Chapter 15, which the reader should review 
at this time. We limit the discussion here to the two-way classification 
setup, fixed effects model with equal ms in the cells and the mixed model. 
For the column means (or the row means), we could have a D or a D' 
computed exactly as before for the case of equal ms. 

In the fixed-effects situation, the needed standard error for a contrast 
is given by the square root of 


2 = (1 1) 
zu mR & ГЬ 


in which s?, is the within cells variance estimate and a and b are the number 
of means being averaged for a contrast. Again, when a — b — 1, we have 
the error for a D-type contrast. The significance of a contrast springing 
from an a priori hypothesis can be ascertained from t= D/sp (or t 
= D'[sy), with df= mRC — RC. A contrast of the data-snooping 
variety will be judged significant at the « level if 2/5 (or D’/sp,) reaches 
K where K is now defined as the square root of the product of (C — 1) 
times the F required for the х level of significance for ny = C — 1 and 
п = mRC — RC degrees of freedom. For comparisons involving row 
means, A and C are simply interchanged. 

For the mixed model with C means based on the same R persons (or R 
sets of matched individuals), a contrast of the D type will have sp 
= 5,218. Given an a priori hypothesis, we have t= D/sp with 
(R — 1)(C — 1) degrees of freedom whereas for a contrast suggested by 
an examination of the data, D/sp must reach К which this time is the 
square root of the product of (C — 1) times the F required for « with 
nj = C — 1 and n, = (R — 1)(C — 1) degrees of freedom. 

It should be noted that for the mixed model situation neither the 
procedure involving ¢ nor that involving K makes any allowance for the 
possibility that the correlation between the scores in the columns involved 
in a particular contrast may differ from the averages of the C(C — 1)/2 
intercorrelations entering into s*,,. The value of т could, of course, be 
calculated independently of the over-all row by column interaction, but 
it is not clear whether the Scheffé method permits this alteration. 

Apparently neither the г approach пог the Scheffé method is applicable 
for contrast of the D' type in the mixed model, but there appears to be 
little need for D' comparisons in the mixed model situation. 


Chapter 17 


TRENDS AND DIFFERENCES 
IN TRENDS 


So-called trend analysis is, in essence, a part of the larger problem of the 
relationship between variates when we have an independent-dependent 
variable situation. Correlational analysis is appropriate for specifying 
relationships between individual difference variables regardless of whether 
or not one variable can be characterized as dependent on the other as an 
independent variable. When it can be argued that one variable is dependent 
(consequent) and the other independent (antecedent), there may be some 
interest in the regression of the dependent on the independent variable, 
both variates being individual difference variables. We have already given 
methods for testing the significance of regression coefficients (p. 142), 
for the equivalent testing of the significance of linear regression (p. 272), 
for testing linearity (p. 275), and for testing the difference between re- 
gression coefficients based on independent samples (p. 143). 

Although our discussion of the analysis of variance has been mainly 
concerned with the significance of the differences between means, the 
perceptive reader will have noted that when a basis of classification involves 
an ordered variable, or factor, such as distance, degree of illumination, 
size, etc., which is manipulable as an independent variable, the F test for a 
main effect is really concerned with whether or not some dependent 
variable, X, is being affected by the factor. That is, is Y as a dependent 
variable influenced by or related to the manipulated variable? This may 
be regarded as a question of Tegression (most mathematical statisticians 
subsume all analysis of variance under regression analysis) or more simply 
a question of trend and its form. For this situation the correlation 
coefficient ceases to be a useful descriptive term, but the presence of linear 
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trend and the slope thereof is of interest as will also be the possible 
curvilinearity of the relationship. Or differences among trends may be of 
primary interest. 

Some of the techniques to be presented in this chapter are frequently 
subsumed under the topic “Orthogonal Polynomials.” 

Review and recast. When we have G levels on a factor, or independent 
variable, with m different individuals randomly assigned into each of the 
G groups, we have a one-way classification design (possible analyses 
suggested on pp. 270-81) with Y as the dependent and X as the indepen- 
dent variable. The ms per group need not be the same, although equal ms 
are preferable. 

When we have C levels on the ordered factor and R levels on a second 
ordered factor (or R conditions not orderable), with m independent cases 
assigned to each of the AC cells, we have a two-way design. A plot of the 
X means against the C values (or levels), done separately for each of the R 
levels (or conditions), will permit the drawing of R trend lines (as in 
Figs. 16.1-16.4, pp. 307-08). Or a plot of the appropriate X means 
against the R values (or levels for the factor identified with the rows), this 
time separately for the C levels, will permit the drawing of C trend lines. 
The test of the A x C interaction provides a test of the difference 
between the A trend lines (or the C trend lines when the row factor is 
ordered). 

When we have C levels for one ordered variable and 2 levels on a 
second factor (quantitative or qualitative) with each of R individuals 
measured under all the BC combinations of conditions (a three-way, 
mixed model), a test of s?,, against 5?,,, is a test of the difference between 
the B trends plotted with appropriate Y means against the column factor 
(or between the C trends when X means are plotted against the B levels 
when blocks stand for an ordered variable). Note that the C means 
entering into the trend for each of the B levels are correlated (based on the 
same individuals); ditto, the B means for C trends. The use of 5?,,, as the 
error term allows for the correlation. 

If for the B levels we used B sets of different persons, R persons per set, 
the C means for the trend of X against the C levels would again be cor- 
related but the B trend lines would be uncorrelated. The test of the B x C 
interaction, as specified in Case XV, p. 335, is appropriate for testing the 
differences among the B trends. 

A significant interaction in any of the foregoing types of situations 
simply means that the trends or curves are not parallel, regardless of their 
general shape, or the form of the relationships. A presentation of the 
trend lines or a description thereof is necessary for an interpretation of any 
claimed statistically significant differences among the trends. 
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LINEAR TRENDS 


The specification and testing of linear trends in psychology is of special 
interest for two reasons: (1) many relationships are linear in form, 
sometimes predictably so from theory, and (2) the question as to whether a 
relationship is nonlinear is readily approached via a test of departure from 
linearity. Although we have already set forth a method (pp. 272-75) for 
testing linear trend (linear regression) and for testing nonlinearity, there is 
a somewhat shorter approach which is applicable only when the levels on 
the factor being varied experimentally are evenly spaced and there are m 
scores (measures) at each of the G levels. We will need to distinguish 
between two situations: (1) when the m scores are uncorrelated from level 
to level (i.e., m independent cases assigned randomly to the G groups) 
and (2) when the m scores are correlated (i.e., based on just m persons or 
m sets of matched persons). The first of these will involve one-way, the 


other two-way, analysis of variance but some computational methods 
developed for the first will also be applicable to the second. 


Linear trend: uncorrelated observations. First, a little algebra. It 
will be recalled that the regression sum of squares was shown to be equal to 
Nr*S?,, where Y is the dependent variate and X the independent variate, 
the variable for which we now choose the G levels. We have 


X(Y'- YP2NPS-N (== Js, = Cay)’ _ (Хау)? 
NSS. NS, Уд? 

Ви 

Lay = У(Х = XXY — Y) 
ZXY = Yxx- YZY + EXY 
XXY-— YN — XZY + NXY 

Lay = EXY — YXy 
Now consider the sum, 
units. 


ll 


ll 


Ex Y, with z in deviation units and Y in original Y 


NMY-XQ-XyYocmxy — yxy 
Thus Lay = Ex Y 
To simplify computations we may code the X variates into numerically 
small values, with a mean of zero so as to possess one property of deviation 
scores. Let us use v for the coded values of the G X values, or points, used 


to define the G levels. If G, the number of levels, is an odd number, we can 
assign a v of 0 to the middle level and have coded values of 


$548, 3, 1 0,1, 2, 3,4. --- 
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and if G is an even number we can assign — 1 and +1 to the two middle 
levels and have coded values of 


E —5. —8, i, 15,8, $, 7,555 


Let v,, with g = 1 +- 5 G, be the coded value for the gth group (level) and 
let Y, be the Y scores for those in the gth group. Then 


Evy = XoY = Ur Y, + +++ + w, Y, + + Doe Ve 
= Уу + +0,2Y,+°+* ш> Уе 


Simply sum the m Y scores for each group (level), multiply by the v for the 
level, and sum over groups, thus obtaining what we will designate as Ev Y 
instead of Xv, Y,, a more exact symbolization. (The С separate sums of Y 


Scores ЫЙ dina have been obtained when computing the total, between- 
groups, and within-groups sums of squares.) 

The regression sum of squares, (Xry)/YXa?, will be (Угу). 
= (Xv Y)?/Xv? in terms of the vs, or coded Аз. With m cases per level, we 
have Xv? = Хри?, = mYa?, Simply square the (numerically small) v 
values, sum, and multiply by т. Thus, we have for the regression sum of 
squares, 

EX(Y'— У) = mX(Y',— YY = (vY) mE 


which, since it has 1 degree of freedom, corresponds to the s?, of p. 274. 
This is sometimes called the variance estimate for the linear component. 
It must not be forgotten that the foregoing computationally simple 
approach holds only for equal spacings for the levels on the independent 
variable, X, and for equal ms in the G groups (at the G levels of X). 

By computational methods already given (formulas 15.6-15.8) we can 
obtain the total, the within-levels, and the between-levels sums of squares 
forthe Ys. Recall from p. 270 that 


xx(y- YP = XX(Y- Yy + mxX(Y, — Yy 
g 
and from p. 276 that 
тУ( Ү, = Ү) = тУ( Ӯ, — Y' y + ту XY, — YF 
0 


From the last equation we see that 


mx(Y,— YZ)? = mA Y, YP — mX(Y', Yy 
g 9 


provides a way of calculating the sum of squares for the deviation of the 
array, or group, means from linear form. 

The breakdown of the total sum of squares along with dfs and variance 
estimates may be assembled, as in Table 17.1. It will be noted that this 
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table does not contain the “residual from line" component of Table 15.4, 
which was used there as the error term for testing the significance of s?,. 
Actually, we do not need this as an error term; we may take F = s? [s?... 
The n, for the F table will be mG — G, a value that will be somewhat 
(usually slightly) smaller than the п» = тС — 2 when the variance 
estimate based on the residuals about the line is used as error. This slight 


loss in df will, in the long run, be compensated for by the fact that Sa 
tends always to be smaller than Sos 


Table 17.1. Analysis of variance for linear trend, Y as dependent on G levels of X 


Source Sum of Squares df Var. Est. 
Between levels тУ( Y, — Y С —1 52 
Linear trend mx( Y', — Y} 1 8f, 
Deviation of means 

from line mx( Y,— yy G-2 524 
Within levels EX(Y Уу? mG —G i. 
7 
Total EX(Y — yy mG — 1 


ЖЫ == == ыз 


$ various connotations for various people: 
ip, à significant linear trend, a significant 
linear Tegression, a significant linear slope 


Буз! а zero), a significant linear rate of change, a 
significant linear component of trend, 


» is significant, we would expect 


eraot aed 1 It is possible for s% to be insignificant 
while s э 1 significant simply because the latter takes into consideration a 
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one-tail test when the direction of the difference between two means has 
been predicted from theory. Certainly, if we have predicted a systematic 
increase (or decrease) of Y means for the G levels of X, the extent to 
which observations do show the predicted trend should somehow emerge 
in the statistical analysis. 

Parenthetically, there are times when theory predicts a directional 
outcome for G experimental conditions not involving levels on a manipu- 
lable ordered variable—the factor is qualitative instead of quantitative. 
Suppose for four conditions labeled А, B, C, and D that theory predicts 
Y, > Yo > Yp > Ygand the observations tend to confirm this predicted 
ordering. Unfortunately, there seems as yet to be no satisfactory way to 
incorporate the predicted ordering of results into a significance test. 

Linear trend: correlated observations. So far our treatment of a 
linear trend has been confined to the setup where the m scores for each of 
the G groups, or levels, are independent from group to group. Suppose 
each person is measured at each level, and that the levels are again chosen 
to be equally spaced on the factor, or independent variable. This becomes 
a two-way analysis of variance setup, mixed model, with R rows for R 
persons and C columns for C levels on the factor. The differences among 
the resulting correlated column means, it will be recalled, are tested by 
F = s*|[s?,. The means for the C columns when plotted against the C 
values of the independent variable may show a trend the linear component 
of which we may wish to test. With equal spacings for the levels on the 
independent variable, we may again set up coded, or v, scores for the points 
on the independent variable and proceed to compute the sum of squares 
for the linear component of the trend, REX", — XY, as (ZvX)*| RE? in 
exactly the manner indicated earlier for G independent groups. (Note: we 
use X here as the dependent variable since the entire discussion of two- 
and three-way analysis of variance has been in terms of Xs.) The sum of 
squares has 1 degree of freedom, hence is equal to an s?,. 

What we have done here is to break up the between-column sum of 
squares into two component parts, a linearly predicted part and deviations 
of the means from linearity: 


RZ(X,— Xy = КУ(Х', — XY + RX(X., — X*9* 


which again allows us to obtain the sum of squares for deviations from 
linearity by subtraction. This sum divided by C — 2 will give an s?,. Thus 
Е, = s*[s?,, is а test of the main effect of the factor; F, = s?,/s?,, tests 
the linear trend or linear regression; and Е, = s*,/s?,, provides a test of 
the departure from linearity. 

The use of s?,, instead of s?, as the error term distinguishes between 
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two situations involving the relationship 
variable (factor): 5*, is used when t 


groups residuals. The Procedure entails the ¢ 


alculation of the sum of 
Squares, £z? for x, Ey? for Y, and Уху, all three Separately for each of the 


G groups. The two sums of Squares are computed by (3.6) and the cross- 
product sum by УУУ — yyy Y/N, with N replaced by m,. 


Table 17.2. Calculations for testing differences among G slopes 


Group df Ул? Ey? улу Bjs XY’ — yg ЕСУ үз? df for residual 
1 m=! А B G C/A, C*/A, B, — СА; m, — 2 
gs m=i A, B, C, C,/A, С ГА, B,— C/A, m, —2 
б mg=i Ag Bg С, ColAg сл; By — СА; mg — 2 
Sum Ywm,—G SA, EB, УС, XB, — СА) Em, — 26 


Within Em,— С A, B, CG Сыл C/A, 


w By — Сл 


A tabular arrangement (Table 17.2) of these sums, along with additional 
indicated calculated values, will facilitate the exposition. In this table the 
As, Bs, and Cs represent Ea?, Y»? ang Уху, respectively. The slope, ог 
В, is calculated as Xxy/Xz?: the regression sum of Squares, E(Y' — yy, 
is given by (Exy)*|Xa?: and the residual sum of Squares, X(Y — Y’)2, is 
obtained by subtracting the regression sum of Squares from Xy?, The first 
four and the last two columns are summed downward to get the “sum” 
line, and the first four of these sums are entered as the first four values in 
the “within” line. The next three entries in the “within” line are obtained 
from the A,, B,,, and C, of the "within" line, not by summing downward. 
Note that the A,,, B, Gs being XA, УВ,, and EC,, are nothing more than 
the familiar within-groups sums, obtained by first summing within groups 
then summing over groups; hence, the subscript w. The student should 


Em,-G—i 
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convince himself that C,,/A,, does not correspond to the regression coeffi- 
cient that would hold in case all the groups were combined, or thrown 
together (thus yielding one scatterdiagram), with sums of squares and the 
sum of products computed about the grand total means. It is also true 
that the value of C,,/A,, is not a simple average of the C,/A, values. 

Under the null hypothesis that the population slopes for the G groups 
do not differ, we are in effect saying that a common slope holds for the G 
populations. The C,,/A,, is taken as the best estimate of this common 
slope, an estimate that in no way depends on possible group differences in 
the X and in the Y means—we need not assume equality of means. The 
residual, В, — C?,/A,, about the regression line with slope C,,/A,, will 
have (Хт, — G — 1) degrees of freedom; G degrees of freedom are lost in 
the calculation of B,, and an additional degree of freedom is used up in 
calculating the one slope, C,/A,. The df for X(B, — C?,/A,) is simply the 
sum of the dfs for the parts being summed; i.e., 2m, — 2G. 

If all G slopes were exactly the same, each would equal C,,/A,,, and the 
sum of the G residual sums of squares would be exactly the same as the 
residual sum of squares in the "within" line. That is, X(B, — C2,/A,) 
would equal В, — C?,/A,, exactly. But in practice, the G slopes will not 
be the same, even when the population slopes are identical, simply because 
of sampling errors. If it is recalled that for any sample, the slope В,, 
taken as rS,/S,, or the exact equivalent Xxy/Xa? = C/A, is that value of the 
slope (of the regression line) which minimizes the residual sum of squares, 
it is readily seen that the residual sum of squares for, say, group g will be 
larger about the line with slope C,,/A,, than about the line with slope СА, 
(unless the two slopes happen to be equal). The same will hold for all 
G groups simply because C,,/A,, is not the optimum value for the separate 
groups. The greater the divergence of the separate G slopes from C,/A,. 
the larger the residual sum of squares on the “within” line compared to 
the sum of the G residual sums of squares. That is, В, — C?,/A,, will be 
larger than X(B, — C?,/A,). This means that В, — C?,/A,, as a sum of 
Squares may have a source of variation which does not affect the sum of the 
G separate residual sums of squares. That source is the possible differences 
among the G regression coefficients, or slopes. 

Accordingly, we may break down the residual sum of squares in the 
"within" line into two parts: a within-groups residual about the separate 
regression lines, or Z(B, — C*,/A,), plus differences among slopes. The 
sum of squares for slopes is obtained by subtraction: (B, — C?,/A,) 
— XB, — C?,/A,). Likewise, the df for the slopes part is obtained by 
subtraction: (Em, — G — 1) — (Em, — 2G) = G — 1. Division of the 
sum of squares for slopes by G — 1 will yield a variance estimate, s?,,, 
and division of (B, — C?,/A,) by its df will yield a within groups residual 
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variance estimate, s$2,,,,. Then we have F = 52. i[5?,,,(,, as a test of the 
differences among the G slopes. 

This test for the differences among slopes is general in that it is applic- 
able (1) when Y and X are both individual difference variables and the 
G groups are independent or (2) when Y is regarded as dependent on X as 
a manipulated variable and the G groups are independent and there is also 
independence from level to level of X. If for the latter situation the levels 
on X are equally spaced and identical for all G groups with the same 
number of cases per level within a group, the computation of the Ezy terms 
and the Xz? terms can be simplified by using the coding system (the vs) 
suggested on p. 348. It is preferable, although not required, to have equal 
group Ns, i.e., equal m,. When both the m, and the spacings of X are 
equal, C,/A,, will be the simple average of the C,/A,. Otherwise, it is a 
weighted average, m, being the weight for the gth group. 

Slope differences, independent groups but correlated observations within 
groups. The scores may be arranged as in a three-way analysis of 
variance, with blocks for B independent groups that are measured under 
the B conditions (either qualitatively different or as B levels on a quantita- 
tive factor), with columns for C levels on a quantitative factor or as C 
trials in a learning task, and the rows for R individuals. The observations 
from column to column are (likely) correlated because we have repeated 
measures on each individual. In ordinary analysis of variance this setup 
is Case XV (p. 335) for which the test of the B x C interaction provides a 
test of the differences among the B trends (p. 337). Our present concern is 
the differences among the linear trends, or slopes, shown by the B groups. 

When this is of interest to the experimenter he should, for sake of 
computational simplicity, have €qual spacings for the C levels with exactly 
the same levels for all B groups. (In the learning setup, it is usually tacitly, 
perhaps gratuitously, assumed that trials constitute equal spacings.) The 
method to be given here presumes €qual spacings for the C levels. The 
linear part of any possible trend for the bth Broup can be specified in terms 
of the best fitting line to the successive C means (the Xyp -ttp Хе, 
Х.с) for the dependent variable, here designated as X. But the rth 
individual in the bth group has C scores which permit the plotting of an 
individual trend line which, in turn, may be described in part by a straight 
line the slope of which will Tepresent the linear component for the indivi- 
dual's trend. These individual slopes will always show variation from 
person to person, hence are variates- we may regard the slope for an 
individual as a sort of "score." The average of these scores (slopes) for 
individuals in the bth group will correspond to the slope for the bth group, 
thus permitting us to regard the group slope as a mean. We will have B 
such means. 
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If we code the C levels as vs, we can readily calculate a slope (for 
the linear regression of X on the coded scores of the factor) for each 
individual. It is simply Ev X/Yc? (по m here since each person has just one 
Y score at each of the C levels). The calculation of RB (ог N, where 

‚ = =N,) different values of EvX is greatly facilitated by having the v 
values on a strip that can be placed just under the Xsinarow. The Xi? isa 
constant—the same for all individuals. For the present purpose the sizes 
of the B groups need not be equal. 

With the individual slopes calculated, the test of the significance of 
the differences among the group slopes is not only easy to carry out but 
also easy to conceptualize. When we regard the individual slopes as 
"scores," we have a simple one-way analysis of variance setup with a 
breakdown of the total sum of squares (for the slopes) into between-group 
and within-group sums of squares with B — 1 and RB — B (or N, — B) 
degrees of freedom, respectively. The computations for this part are by 
formulas (15.6-15.8) or (15.9-15.11), with the individual slopes taken as 
the Ys for those formulas. Actually, the calculations for the test of 
significance can be made on the individual XvX values as the "scores" for 
the one-way analysis of variance since the Xv? part of the individual slopes 
isa constant. F = s?,/s?,, is the desired test for judging whether the group 
slopes (or linear regressions) are heterogeneous. 

Slope differences, observations correlated two ways. The scores can 
be arranged as in a complete three-way analysis of variance into B blocks, 
C columns, and R rows, there being a total of just R individuals. Each 
person has a score in every column and in every block—there will be 
intercolumn and also interblock correlation. Provided the C levels are 
equally spaced, we can again code in order to compute SvX/Zv? as an 
individual's slope for X on the column factor (independent variable) coded 
as vs, but now each individual has B slopes since he has a set of C X scores 
in each block. The total number of individual slopes will be RB, and 
either these slopes or the Хрл values (Xt? again being a constant) can be 
arranged into a new table with R rows and C columns (each of the block 
conditions is now assigned to a column position for this new table). Thus 
we have a two-way analysis of variance setup, mixed model, for which 
F = s? [s?,. with the usual dfs will provide a test of the differences between 
the B slopes (one for each block) since the means of the columns in the 
new table correspond to the slopes for the B blocks, each block slope being 
the mean of individual slopes. Due allowance has been made for any 
possible (and likely) correlation between the blocks. The foregoing 
analysis is possible because for equal spacings on the C factor the slope for 
the bth block of scores is the same as the mean of the R individual slopes in 
the block. The B blocks may stand for B qualitatively different conditions 
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or for B levels on a quantitative factor, with no requirement that the B levels 
be equally spaced. 

Now suppose that the B blocks represent B equally spaced levels on a 
factor and that we wish also to consider the linear parts of the C trends 
which are exhibited when we plot the appropriate means (the Y,,, --- 
Krat Ria Жн, separately for each of the C sets, against the B levels of 
the block factor. (If the student is confused about how to pick out these 


should refer to Table 16.9. For the earlier discussed trends for Y against 
the C factor, the means along the bottom of each block are used, whereas 


» the testing of s2,. against s? 


roe 


example, the original blocks and columns stood 


nation levels, respectively, blocks and columns in the rearranged table 
would stand for illumination levels and distances, 


respectively. Regardless 
of the arrangement forcomputing the ZvY, once we have them they are the 
values which become the "scores" for a two-way analysis of variance, as 
before. 


HYPOTHESES ABOUT CURVATURES 


curving relationship. Now th 
predicted from theory, thus permitting us to 


50 beyond the general 
statement that Y depends on Y, or y == f(X), 


to an equation involving a 
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specified (predicted) form for the relationship, such as Y' = Blog X + A 
or Y'= Ae®* or Y'= X[(A + BX), and so on. Or on the basis of a plot 
of Y (or the Y means) against X we may proceed empirically. With know- 
ledge concerning the shapes of various mathematical curves, we select the 
form of the curve that might fit the observations. Whether the form is 
arrived at from theory or empirically, we determine the numerical values 
for the constants called for in the mathematical equation of the chosen 
form. Since the general problem of curve fitting is far beyond the scope 
of this book, the author refers the reader to the excellent discussion of the 
topic in Don Lewis’ Quantitative Methods in Psychology (New York: 
McGraw-Hill, 1960). 

Here we shall only be concerned with going a step beyond the question 
of whether a significant departure from linearity is haphazard or shows 
sufficient regularity to suggest that some type of systematic curvature is 
present. This need not be empirical in that theory might predict that a 
relationship involves an increasing, leveling off, then decreasing function 
(or a decrease, leveling, then increase) or a rapid rise followed by a leveling 
off. The theory might not be sufficiently well developed to permit a 
prediction of a more specific form for the relationship, particularly for 
parts of the curve beyond (above or below) certain chosen levels for the 
independent variable. In other words, we may merely predict that a 
segment (that for the chosen levels of Y) of the relationship between Y 
and X should show curvature. 

The argument in favor of proceeding to a curvature component ofa 
trend is similar to that given earlier (p. 350) for going from the ordinary 
F test for between levels to the testing of the significance of possible 
linear trend. We may here use the earlier illustration in which we presumed 
that the means for the dependent variable were 19, 26, 16, 23, 21 for five 
consecutive levels on the independent variable in contrast to 16, 19, 21, 23, 
26 for successive levels. Although the identical Fs for between levels 
might fail to reach significance, a significant effect might be claimed for 
the second set via linear trend. Now suppose the means are 16, 19, 26, 23, 
21 for the successive levels. A plot of these will show apparent systematic 
curvature, but very little linear trend (near zero slope). Would such an 
observed curvature prove to be a nonchance affair if tested by a method 
that gives some consideration to the systematic curving trend? If so, could 
X be claimed as having a significant effect on r? 

As a first approximation, we may regard a segment of a quadratic curve, 
defined by the equation Y'= A + BX + СХ?, as “fitting” (maybe) the 
segment of the "curve" based on available data. The quadratic component 
resides in the CX? term, so in effect we have the question of whether C 
differs from zero. It must be understood that rarely in psychological 
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research will we find a logical reason for predicting a quadratic form of 
relationship between a dependent variable and an independent variable 
over a wide range of values for the latter. The quadratic form is here used 
merely as a basis for testing the hypothesis that some curvature exists 
which, if taken into account, would explain a significant portion of the 


Table 17.3. Coded values, и, for quadratic component of trends for 3 to 10 levels 
on an independent variable 


Level 1 2 3 

u +1 -2 41 

Level 1 2 3 4 

u +l =i ч 4] 
Level 1 2 3 4 5 
u *2 Дд -2 мы фә 
Level 1 2 3 4 5 6 
u +5 -1 —4 4 <=] +5 
Level 1 2 

u T5 0 -3 4 .3 0 +5 

Level 1 2 3 4 5 6 7 8 

u SZ ud] 3 5 —5 8 1 47 
Level 1 2 3 4 5 6 T 8 9 
u +28 +7 -8 -17 -20 —17 -8 47 +28 
Level 1 2 3 4 5 6 7 8 9 10 
и +6 42 =] 3 =£ =3 у що +6 


ће dependent variable) The sum of 
nt is given by (Хи Y)?}/mEu?. Note the 
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similarity to the earlier given sum of squares for the linear component— 
the only difference is in the coded values used. This new sum of squares 
has df = 1; this df has to do with the one constant C in Y' = A + BX 
+ CX? which controls the curvature, just as the linear component was 
concerned with the one constant B in Y'= A + BX. With df = 1, the 
value of (Zu Y?|mXu? is automatically a variance estimate, which we will 
symbolize by s?,. 

To test 5, for significance, we need an error term appropriate to the 
situation. If the observations are independent from level to level, the 
error term is the ordinary s^, of one-way analysis of variance. If the 
observations are correlated (same persons measured at each level), we have 
a two-way analysis of variance layout with C columns for C levels and R 
rows for persons (m — R for foregoing indicated computations), and the 
error term is s®,,. 

No attempt will be made here to explain the derivation of the sets of 
us in Table 17.3. Aside from the property that Zu = 0 always, an examina- 
tion of any one set may help us understand more fully their use in testing 
for a curvature component of trend. If the reader will plot any one set 
of us, say for G = 5, against any imagined five equally spaced values for a 
quantitative factor, X, he will have five points that follow a curve. Now 
suppose the X Y values for all five levels are identical—a plot of the five corre- 
sponding Y means against the five X values will, of course, be a horizontal 
line. With EY a constant from level to level, the /inear component, using 
the earlier defined vs (—2, —1, 0, +1, +2), will give Loy = p 15 
=z Y,2v, = 0 since Ev, = 0. The linear component is zero as it should 


be for a zero slope. When we proceed to compute Уи У (equivalent to 
ZuEY,) with us of +2, —1, —2, —1, +2 (see Table 17.3), we have 


X Y,Xu,, which is also zero because Zu, — 0. No curvature when all five 
g 


Y means are identical! Now, just as the departure of XvY from zero 
indicates the presence of a linear component in the trend, the departure 
of XuY from zero indicates a quadratic component. If the five means were 
such that the successive X Y values were 10, 20, 30, 40, 50, we would have 
XuY = 2(10) — 1(20) — 230) — 1(40) + 2(50) = 0. An obvious linear 
component, but no curvature. (In this last example there was no constant 
XY which could be taken from under the indicated summation over 
groups.) 
Let us again consider the three sets of five means mentioned earlier 
(p. 357): 
Set A 19 26 16 23 21 
Set B 16 19 21 23 26 
Set C 16 19 26 23 21 
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With each mean based on m cases, the required У У for group g will be 
mY,; hence the Xv Y for a set will be Som Y — mv Y and the Xu Y fora set 
would be Eum Y = mXuy. 

For Set 4 we have for the linear component 


ZvY = m[—2(19) — 106) + 0(16) + 1(23) + 2(21)] = +1(m) 
and for the quadratic component we have 


£XuY = m[+2(19) — 1(26) — 2(16) — 1(23) + 2(21)] = — (m) 
from which we see that both the 1 
perhaps negligible (perhaps 
applied). 

For Set B we have 


inear and the quadratic components are 
because no significance test has been 


ZvY = m[—2(16) — 1(19) + 0(21) + 1(23) + 2(26)] = 24(m) 
and 


EuY = m[+2(16) — 1(19) 


for which we see a sizable linear c 
For Set C we have 


Уру = m[—2(16) 


— 2(21) — 1(23) + 2(26)] = 0(m) 


omponent with no quadratic component. 


= 1019) + 06) + 1023) + 2(21)] = 14(m) 
XuY = m[4-2(16) — 1(19) — 2(26) 


which indicates the Possible presence of both components. 

For sake of illustration, let us Suppose that s2, 
The between-groups sum of s 
toan 5°, of 145. The sim 
yields F = s*[s?, = 
does not quite reach the -10 level of Significance. This F holds, of course 
for all three sets, 


and 


— 1(23) + 2(21)] = —20(m) 


i mponent sum of squares, (Xp Y) mEn = 100/100 
= l; hence у > = 1, and F= 1/71, which is far from significant. The 


© component sum of Squares 
285.71, from which we get F = 285.771 = 4.02 
which is significant at the .05 level. 
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Admittedly, the foregoing starting means, with ms of 10 and 5°, of 71, 
were contrived for a purpose: to show that the ordinary F (s?,/s?,, or 
s*.[s?..) test is not sensitive to possible systematic trends and that the 
sensitivity of the statistical analysis for an effect can be improved by a 
method that takes into consideration the systematic trend shown by the 
data. This bonus in sensitivity is particularly deserved by the experi- 
menter who has made an a priori prediction either that the trend will 
involve a linear component or that it will have simple curvature (or both). 

The foregoing methods of analysis can be extended in two directions. 
(1) Group differences in quadratic components can be tested in all those 
types of setups for which we have discussed tests of differences in linear 
components. Space does not permit the presentation of these seldom- 
needed extensions. (2) The between-groups (levels) sum of squares can be 
broken down into additional components—cubic, quartic, quintic, etc.— 
each with df of 1. Since these polynomial forms of relationship are scarce 
in the empirical data of psychology and are even more scarce in the minds 
of psychological theorists, there would seem to be no good reason for 
going beyond the second degree polynomial (the quadratic) in this business 
of extracting components, with its implication that lawful relationships 
among psychological variables somehow involve a cubic or higher order 
polynomial curve. Admittedly, the quadratic form of relationship may 
rarely hold for psychological variables, but the testing of the quadratic 
component does provide us with a more sensitive statistical test of an 
effect than is possible by F = s?,[s?,, when systematic curvature is present 
and/or has been predicted. 


Chapter 18 


ANALYSIS OF VARIANCE: 
COVARIANCE METHOD 


ntation to choose, either by random 
ng, groups that are comparable on 
parisons to be made. There are times, 
€ to use intact groups which may differ 
onally we may wish to make an un- 
5 not seem justifiable in light of known 
rimental control is the ideal, but, if this 


cannot be attained, we may resort to statistical allowances and thereby 


Suppose that two intact 


groups are being used to evaluate the relative 
erits of two methods of 


memorizing and that the mean IQ is 105 for 


difference in IQs. Let us Suppose that the mean memory performance is 
60 for group A and 70 for group B, and that 


substituting 105 and 111 in the 
Tegression equation yields a predicted valu 
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The next question concerns the proper sampling error to use in evaluat- 
ing the adjusted difference. It should be obvious that the ordinary 
procedure is inapplicable for the simple reason that we have tampered with 
the obtained means and in so doing have interfered somewhat with the 
operation of chance. 

lt is the purpose of this chapter to give a precise method for making 
allowance for an uncontrolled variable and to set forth the sampling error 
adjustment which is needed in testing the statistical significance of the 
difference between *'corrected"' means. The method is applicable whenever 
it seems desirable to correct a difference on a dependent variable for a 
known difference on another variable which for some reason could not 
be controlled by matching or by random sampling procedures. Since the 
scheme about to be proposed has an analysis of variance setting, the reader 
can readily guess that it will provide an adjustment for, and a test of 
significance of, the differences between two or more groups, and that it will 
be usable for either large or small samples. It isassumed thatthe dependent 
variable has a distribution which does not depart too far from the normal 
type and that the variances from group to group are similar. 

In order to present the required adjustments, we need first to consider 
covariance, which is defined as Xxy[N or У(Х — X)(Y — Y)/N. The sum 
of products of deviations can be broken down into components in a manner 
similar to that used with a sum of squares. In the simplest situation we can 
have m pairs of X and Y scores in each of G groups. These pairs of 
Scores can be recorded in some such fashion as that depicted in Table 18.1. 


Table 18.1. Schema of scores for covariance 


Group 
1 2 g G 
Xu Yu Xi Yi Xy Yio A Nie 
Xa Yn Xə Ys Xos Уш Xoo Yog 
Ха Ya Хз Yio Хи Yio X; Yig 
Xm Yn Хм Үлә Хм Ying Xma Yna 


Note that X;, and Y; stand for the X and Y values of the ith individual 
in the gth group. Note also that in allowing ; to take on values running 
from 1 to m we do not imply any order for the individual, and that the 
ith individual in one group is in no sense paired with the ith case in another 
group. The product of the deviation scores for the ith individual in the gth 
group would be (X; — X)(Y,, — Y), in which X and Y are the means for 
all mG cases. The total sum of products would be ZX(X;, — X\(¥;, — Y). 
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Now each deviation can be expressed in terms of two components in 
exactly the same way as in Chapter 15; i.e., one part is the deviation of the 
score from the mean of thc group to which it belongs, and the other part 
is the deviation of the group mean from the total mean. Thus we have 


(Xo — X) = (Xo — X2) + (C, — X) 
and m = _ _ 

(Y, — Y) = (Y — Y) + (У, — Y) 
Then the foregoing sum of the products becomes 


УХХ, — X) + (X, — YIN. — Y) + (Y, — Y)] 


When the bracketed expressions are multiplied together, four terms result, 
and, since two of these vanish, we have left that the tota] sum of products is 
equal to - = — = 
ExXG,— XY, — Y) + m=(X, — XXY, — Y) 

The first of these terms involves a within-groups sum of products, whereas 
the second is for between groups. If there happens to be an unequal num- 
ber of cases per group, the m of the second term goes under the summation 
sign as m,. The degrees of freedom for the total sum of products is 
mG — 1, or N — 1, where N is the sum of the m,s; the dfs for the within 
and between terms are mG — G (or N — G) and G — 1 respectively. 

It will be of convenience to assemble in a table the sums of products, 
along with the sums of squares, for both the X and Y variables. These will 
be found in the first three lines of Table 18.2. 

Although we are here presenting the covariance technique as a method 
for making such adjustments as discussed in introducing this chapter, it is 
of interest to link covariance with the problem of correlation. The product 
moment correlation coefficient is usually defined as 


_ Уту 
NS,S, 


which may be written as 
2o EXxy __ Lay У(Х — XyY- Y) 


EU NEN VENE VAX — xy4xo- У)? 


or as a function of a sum of products and two sums of squares. Using the 
sums of Table 18.2, we may specify three correlations: one based on the 
total sums, one based on the within sums, and one based on the between 
sums. These three correlations are indicated in line 5 by letters A, B, and 
C, with appropriate subscripts used to designate the several sums in the first 
three lines of the table. Line 5a gives the dfs for the rs. 
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Note that the between-groups r is actually the correlation between the Y 
means and the Y means for the groups. If this r is significant, it follows 
that one source of the correlation for the total group is the heterogeneity 
resulting from the throwing together of groups with unlike means. (This 
between-groups correlation is meaningless when only two groups are 
involved. Why?) Stated differently, an appreciable between-groups r 
indicates that the total r is spurious; this Spuriousness is eliminated when 
r is computed from the within sums. The similarity of the within-groups r 
to the partial correlation coefficient will be recognized by the discerning 
student, especially if he recalls the derivation of the latter. 

We now turn to the use of covariance as a basis for allowing for the 
influence of an uncontrolled variable on the differences between group 
means. The question here is not what the result would be if the uncon- 


-2, which will serve as an outline of the required 
computations. Line 6 of this table gives i 
predicting X from Y. Since no use will 
it need not be computed. 

That these A/C values are тезге 


ssion coefficients сап readily be demon- 
Strated. In Chapter 9 the regressi 


on of X on Y was Biven as 
r 


b, = r2 


ш А и 
Since, as we have seen previously, 


Lay S SIN d S z 
—— a „= x" » d = 
[ze xy n y Узум 


r= 


we have 


зу A 
Xy c 


In order to make allowance fo 
need not only to adjust the X, va 
error term, which is used as the 
difference between the adjusted 


Chapter 15, F will involve the ratio of between 
variance estimate. 
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First, let us consider the method of making the adjustment to the total 
and to the within-groups variance estimates. The problem here is that of 
specifying how much of the variation in X can be predicted from variation 
in Y and then of subtracting this to secure the left-over variation as an 
adjusted value. But this left-over variance is nothing more than the 
residual variance, or square of the standard error of estimate, obtainable 


from formula (9.6): 
Sy = 5°, — iS, 


Actually the adjustment is to be made to the sum of squares. In order to 
state the residual variance in terms of sums, we may substitute for S?, and r?. 


Thus, 
Уа? (Ery? Уа? 


S si = -E 2 2 
N (Ery) М 
һепсе, 
8: == 2 (Ery? 
NS?,.y = Ўз — yy 


Since NS? always equals a sum of squares, the value of NS?,., is obviously 
the sum of squares for the residuals. In the notation of this chapter, 


ii: [EZX - 3, — pl 
zx(X,-— YY 


NS? = XX(X4 — X) 
10 


would be the residual sum of squares after the regression adjustment. This 
sum can be written as 


which is the entry for the total group in line 7 of Table 18.2. Similarly, 
the corresponding residual, or adjusted, sum of squares for within groups 
is Б S А? „|С. : 

At first thought it would seem logical to adjust B, by the use of A, and 
and C,, but the between-groups correlation (and regression) is affected by 
the differences between the X means, which are the differences to be 
adjusted and then tested for statistical significance. Our adjustment should 
be one which is independent of the differences to be tested. This suggests 
that the regression for within groups, or A,,/C,,, should be used since the 
regression for the total is also affected by the difference which we are out to 
test. Insofar as we are concerned solely with the adjustment of the between- 
groups X means, the best adjustment would be by means of the within- 
groups regression. This could take the form of either an adjustment to the 
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between-groups sum of squares for X or a direct adjustment to the several 
X, values. 

Although the latter would be the best way of ascertaining how much of 
an effect the noncomparability of the groups with respect to Y had on the 
X means, there is another consideration as to whether the within regression 
is appropriate for adjusting the between-groups sum of squares. It will 
be recalied that F is to be taken as the ratio of a variance estimate based 
on the between sum of squares to that based on within groups, and that 
the two variance estimates being so compared must be independent esti- 
mates. Now, if we adjust both the within and the between sum of squares 
by means of the same regression coefficient (say, that based on within 
groups), any sampling error in this regression coefficient would have a 
similar effect on both adjustments; hence it could not be argued that the 
resulting adjusted sums of Squares possess the requisite independence. 
Therefore variance estimates based thereon would not be strictly indepen- 
dent. 

This difficulty is overcome by taking the adjusted sum of squares for 
between groups as the difference between the adjusted total sum and the 
adjusted within sum of Squares. Thus, for the Purpose of testing signifi- 


cance, 
2 2 
e- - (s. -&) 
C, 


leads to the proper adjustment for the betw 

Perhaps the reader has anticipated that the dfs may change as a result of 
these manipulations. The new dfs are recorded in line 8 of Table 18.2. 
Note that the df for the between sum has not changed since the adjustment 

-groups regression. 

т calculating sums of Squares, we need 
products in terms of raw scores, The 
T unequal m, values, but are of course 


ееп sum of squares for X. 


x: m ZXX,XXY, 

УХ = ХҮ, — Y)- ZXX,Y,— Án for total (18.1) 
= E: EX,QEY, 

BE, = X, — Y) EXXQY, = У + : for within (18.2) 


m, 


EX ud Y, EXX,XXY, 
ee ip between 


Xm(X,— ХӮ, — Y)= у, 
g " N 


g m 
(18.3) 


[18] ANALYSIS OF VARIANCE: COVARIANCE METHOD 369 


Thus to compute the sums of products of deviations, we need the sum of 
all N raw score products or LEX, Y,, the sum of all the Xs or ZXX,, 
i 


ig^ 10° 


10 
the sum of all the Ys or EEY, the sum of the Xs separately for each group 
1 
ог УХ; ,, and the sum of the Ys for each separate group or XY, Adding 


ig? 


Table 18.3. Score data and sums based on raw scores for analysis of variance by 
covariance adjustments 


a 
i Group 
1 2 3 
Y X Y X Y X 
14 10 11 5 7 5 SEX = 173 
9 6 9 2 6 4 IZY = 268 
11 8 8 6 2 1 
12 6 10 5 10 7 УУХ? = 1161 
10 9 10 4 7 9 ELY? = 2642 
11 T 10 8 1 4 
11 9 12 10 6 5 LUIXY = 1688 
8 5 9 6 3 2 
11 6 10 4 2 2 |X(XZX) = 10,401 
12 7 11 6 9 5 У(Х Y)? = 25,362 
Sum 109 73 100 56 E 44 A ed 
Mean 10.9 7.3 10.0 5.6 5.9 44 Ү = 8.93 
LY? ог 
xx? 1213 557 1012 358 417 246 
шш 810 571 307 


the several X sums gives the sum of all the Xs; likewise for Ys. Note that 
to get the second term of (18.2), or the first term of (18.3), we must divide 
the product of the two sums for a group by its m and then sum such 
quotients over all G groups. The reader may find some interest in com- 
paring formulas (18.1-18.3) with formulas (15.9-15.11) and it should be 
apparent that in the case of equal ms formulas (18.1-18.3) can be written 
in the simpler way of formulas (15.6-15.8). 

The required computations are illustrated by using the data (fictitious) 
of Table 18.3, which contains Y and X scores for ten cases in each of 
three groups. The scores in each of the six columns are separately summed 
to yield 109, 73, etc. The scores are squared and summed to yield 1213, 
557, etc. Summing the products of the X and Y values gives 810, 571, and 
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307 for the three groups. Summing over groups yields the double summa- 
tions 173, 268, etc. Certain of these sums are then substituted into 
formulas (18.1-18.3) to secure the total, within, and between sums of 
products of deviations. By substituting the proper sums into formulas 
(15.6-15.8), we get the required sums of squares for the Xs and for the Ys. 
Then these three sets of sums are entered as the first three rows of Table 
18.4, which follows the pattern set forth in Table 18.2. 


Table 18.4. Analysis of variance for X variable of Table 18.3 by covariance 
adjustments for uncontrolled Y 


Total Within Between 
1. Sum of products 142.53 72.70 69.83 
2. Sum of squares: X 163.37 120.90 42.47 
3. Sum of squares: Y 247.87 105.80 142.07 
4. df 29 27 2 
5. Correlation 709 .643 :912 
5a. df forr 28 26 1 
6. b,, value 5750 BEIDE re 
7. Adjusted Ez? 81.42 minus 70.95 equals 10.47 
8. df 28 26 2 


Before proceeding to the covariance adjustment, let us consider the 
means given in Table 18.3. It will be noticed that the groups differ 
considerably on X, or the dependent variable, and that they also differ on 
Y, the relevant but not controlled variable. An analysis of variance based 
on the sum of squares for the Xs leads to a between-groups variance 
estimate of 42.47/2, or 21.26, and a within-groups estimate of 120.90/27, or 
4.48. The F for testing the significance of the between-groups variance 
becomes 21.26/4.48, or 4.75, which for the given dfs is significant at about 
the .02 or .03 level of significance. This analysis does not, of course, allow 
for the fact that the groups differ on Y. If there is correlation between Y 
and Y, the observed differences on X may be mainly a reflection of the 
group differences on Y. As previously stated, the purpose of the covari- 
ance adjustment is to make statistical allowance for such uncontrolled 
differences. 

By following the steps indicated in Table 18.2, we determine the values 
in lines 5 to 7 of Table 18.4. Note that the adjusted Za? for between groups, 
10.47, is secured by subtracting 70.95 from 81.42. The analysis of variance 
based on the adjusted sums of squares (for the Xs) gives a between-groups 
variance estimate of 10.47/2, or 5.23, and a within-groups estimate of 
70.95/26, ог 2.73. Then F = 5.23/2.7 


: 3 — 1.92, which for 2 and 26 degrees 
of freedom yields a P of about .20. Accordingly, it cannot be concluded 
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that there are significant group differences on X over and above those which 
would be expected because of the differences on EX 

It should be obvious that the use of the covariance adjustment method 
must be justified by logical and experimental considerations. When it is 
logical to control a variable by pairing or matching, the covariance 
adjustment is defensible as a way of making proper allowance for a failure, 
because of infeasibility, to control the variable. The use of the covariance 
adjustment is not predicated on the degree of correlation between the 
dependent and the uncontrolled variable. If the correlation is relatively 
low, the adjusted values will differ but little from the unadjusted values; 
if high, both the total and within adjusted variances will differ considerably 
from the unadjusted variances, but, as we shall presently see, the extent to 
which the adjusted and unadjusted between-groups variances differ is not 
solely a function of the correlation. 

It is of interest to make an actual adjustment of the X means of Table 
18.3 for the group differences on Y. The adjustments can be made by 


Xa = X, — b. y= Y) 
in which Y,, is the adjusted value for the gth group, and byy is the within- 
groups regression coefficient. For the data of Table 18.3 we have 


X, = 7.30 — .687(10.90 — 8.93) = 5.95 
Хы = 5.60 — .687(10.00 — 8.93) = 4.86 
Хы = 4.40 — .687(5.90 — 8.93) = 6.48 


Should the reader be surprised that the adjustment puts group three 
ahead, he should ponder the fact that, relative to the within-groups Xand Y 
variances, the third group’s X of 4.40 was not as far below the means of the 
other two groups as was its Y of 5.90. 

From a careful consideration of the foregoing, it will be scen that the 
covariance adjustment method will not necessarily reduce the differences 
between the means on the dependent variable. Situations arise in which 
groups that show marked differences on some correlated but uncontrolled 
variable may yield similar means on the variable being studied. Suppose 
that we are using two intact groups to investigate the relative merits of 
two learning methods, and that the initial means of the two groups are 
markedly different. We would, accordingly, expect a difference on final 
standing even though the two methods were equally efficacious. If this 
expected difference is not found, it follows that the method used by the 
group with the lower initial score was more effective in that this group 
overtook the other group. With groups differing on an uncontrolled 
variable, it is not only as proper, but also as necessary, to use the covariance 
technique when the groups are nearly the same on the dependent variable 
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as when they are different. For such situations the adjustment will increase 
the between-groups variance. The adjusted variances are sometimes 
referred to as "reduced" variances, but it follows from the foregoing that 
this term may be a misnomer for the adjusted between-groups variance. 
The extent to which the adjusted variances lead to a level of significance 
different from that based on an analysis of the unadjusted values will 
obviously depend on three things: the degree of correlation between the 
dependent and uncontrolled variable, the size of the differences between 
the groups on the uncontrolled variable, and the found differences on the 
dependent variable. The applicability of the covariance technique does 
not depend on a minimum degree of correlation or on a definite amount 
of group differences on the uncontrolled variable. But, if the within- 
groups correlation is low and/or there is only a small, chance difference 
between the groups on the uncontrolled variable, the use of the covariance 
adjustment may not be worth the effort. Obviously, if a variable correlates 


near zero with the dependent variable, it need not be controlled experi- 
mentally or statistically. 


The covariance method can be extended to 
differences on more than one uncontrolled va 
of multiple regression, but computationally it 
the adjustments in terms of multiple rs. Wen 
coefficients, one obtained by way of correl 
sums of squares and of products, and the otl 
on total sums of squares and of products. 

If, for example, allowance is to be made fo 
Yı, У, and Ух, we will need six (one for ea 
fourth or dependent variable) auxiliar 
those in lines 1, 2, and 3 under the “total” 


make adjustments for group 
riable. This involves the use 
is perhaps simpler to handle 
eed two multiple correlation 
ations based on within-groups 
her by way of correlations based 


r three uncontrolled variables, 
ch pair of variables— X is the 


2 are made) among А 
these we compute, by the methods set forth in Chapter 11, two r2 
values. Let us designate the multiple based 


aya Vals 
3 on the total sums as R, and 
that based on the within sums as Ry: 


With these two multiple rs available, we may rewrite line 7 of Table 18.2 
as : 
BK(1 — R?) minus B,(1— R2) equals adjusted B, 


with respective dfs of 


N =m N= Ga- gj 


for the л variable problem (one depend 


i й ent, plus the number ofuncontrolled 
variables included in the adjustments) 
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Remark: The use of the covariance adjustment technique is far superior 
to attempts at pairing individuals from the intact groups on the basis of 
one or more uncontrolled variables, a procedure which inevitably leads to a 
reduction of sample size and also runs astride a regression difficulty (see 
р. 161). 

Evaluation of changes. In Chapter 6 we discussed the usually advocated 
method for comparing changes shown by experimental and control 
groups (applicable also for two experimental groups). We have, with i and 
f standing for the pretest and posttest measures and £ and C standing for 
experimental and control groups, 


D= Dy — De = Gg — Xi) — e — Xie) 


as the net change, the change shown by the experimentals corrected for 
that shown by the controls. We may rearrange the Xs, yet maintain the 
numerical value of D = Dp — De, as follows: 


D = (Xi — Хос) — (Ов Xie) 


from which it is seen that the net change may also be thought of as the 
final difference between the two groups corrected for their initial difference. 
Such a correction involves the assumption that each unit of difference in 
initial standing will produce a unit of difference in final standing. In other 
words, this type of adjustment implies а 1-to-1 relationship between initial 
and final scores. Since a perfect correlation is never found or approached 
in practice, one may question whether the usual procedure of comparing 
changes is really defensible. 

It is, of course, entirely logical that group differences on final scores, 
which we may here call the dependent variable, should be corrected for 
group differences on initial standing as an uncontrolled variable. The 
covariance adjustment technique provides a way of correcting final means 
for initial differences, with due allowance for the degree of correlation 
between initial and final scores. The ordinary and the covariance method 
differ not only in the correction but also in the resultant sampling error. 
The ordinary technique uses a standard error which definitely includes, 
either explicitly or implicitly, the variance for both initial and final scores 
and the correlation of initial with final, whereas the error term used in the 
covariance method is a direct function of the degree of correlation and of 
the variance for the final scores only. In other words, the net differences 
being tested are not the same, and neither are the error terms the same. 
The covariance method will, in general, be more sensitive. The student 
should read Professor R. A. Fisher's discussion on this point.* 


* Chapter IX in Fisher, R. A., Design of experiments, London: Oliver and Boyd. 


Chapter 19 
DISTRIBUTION-FREE METHODS 


The F, t, and z techniques are sometime: 
because they involve the estimation of at 
value); e.g., a population variance esti 
for F. These techniques also involve, at t 
of a normal distribution (of some variate 
in this brief chapter are sometimes calle 
not involve the estimation of parameter: 
to as distribution-free methods because 
distribution of variates. 

Advocates of nonparametric tests u 
metric character as much as their di 
that distribution-free methods shoul 


of normality, on which parametric tests are based, may not hold. But in 
light of Norton’s study (p. 252) and Boneau's results (p. 106), the worry 
about violating this assumption seems ill-founded. 

Another argument advanced in fav 
from the level of measurement achie 
levels are referred to as nominal, 


5 referred to as parametric 
least one parameter (population 
mate is needed as the error term 
he derivation stage, an assumption 
). The techniques to be presented 
d nonparametric because they do 
$ and they are sometimes referred 
no assumptions are made about the 


sually do not stress their nonpara- 
stribution-free property. It is said 
d be used because the assumption 


or of nonparametric methods springs 
ved by measuring instruments. These 
ordinal, interval, and ratio scales of 


used for coding the categories, and frequenci 

tions or percentages. The ordinal scale c 

order positions usually specified by numb 

which equal units can be claimed; e.g., when the interval 140-150 repre- 

sents exactly the same amount as the interval 110-120. Such is the case if 

we are measuring length in terms of inches or weighing objects in terms of 
374 
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pounds. Without giving any reasons, we will make the dogmatic-sounding 
assertion that very little “measurement” in psychology involves equal 
units—the scales and tests provide a basis for ordering which may be 
regarded as much better than subjective rank-ordering. Fora scale to be 
called a ratio scale it must have a true zero point, in addition to qualifying 
as an interval scale. 

Now it is claimed by some that the level of measurement definitely 
restricts possible statistical treatment of data. It is easy to see that adding 
numbers, used to code qualitatively different categories, in order to com- 
pute a mean is nonsensical. It is easy to see that a ratio scale is required 
for a meaningful coefficient of variation, (S/M). It is easy to see that rank 
positions as "scores" will lead to absurd standard deviations—if ten 
persons are ranked the ranks range from 1 to 10 whereas if twenty-five 
persons are ranked the rank scores range from 1 to 25—the amount of 
variation is a direct function of N, hence a o or an S is not descriptive of 
group variation. And it is easy to see that the adding of scores that do not 
qualify as being on an interval scale may make the mathematical purist 
dubious about the precise descriptive property of the mean, variance, and 
product moment r, all of which call for addition. 

The crucial question, however, is whether or not the F, t, and z tests 
can, in view of their dependence on means and variances, be safely used 
when the scale of measurement is, as is the rule in psychology, somewhere 
between the ordinal and the interval scales. The question boils down to 
this: Will Fs, 15, and zs follow their respective theoretical sampling 
distributions when the underlying scores are not on an interval scale? The 
answer to this is a firm yes provided the score distributions do not markedly 
depart from the normal form. Nowhere in the derivations purporting to 
show that various ratios will have sampling distributions which follow 
either the F or the or the normal distribution does one find any reference 
to a requirement of equal units. The attaining of an interval scale of 
measurement, though desirable for some reasons, will not alter the risks 
of type I and type II errors when statistical inferences are made. 

There is, of course, no denying the fact that the type of data available 
does dictate the type of statistical technique that can be used. We have 


already discussed methods for handling nominal data—either by the 
2 both of which may be called distribution-free methods 


binomial or by x^, epo qe 5 
because no assumptions аге made about the distribution of the variable or 
n the form of ranks may force 


variables underlying the categories. Data i 
one to use Spearman's rho or Kendall's tau or the tests to be presented 
later in this chapter. 

In general, distribution 
purposes to data which are n 


-free methods, when applied for comparative 
ormal or nearly normal, are not as sensitive 
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(that is, as powerful for avoiding type II errors) as the appropriate z, 
1, or F technique. Consequently, in using a nonparametric method as a 
short-cut, we are throwing away dollars in order to save pennies. 

The sign test. Perhaps the simplest of all distribution-free methods 
is the “‘sign” test, which is applicable for testing the difference between 
two correlated sets of scores. The procedure is to consider the N pairs of 
differences, X, — Ху, some of which will be plus, some minus (with an 
occasional zero). If there is no difference between the two sets of scores 
we would expect the plus and minus signs to be equally divided. To test 
whether there are more plus signs than reasonable on a chance basis, the 
binomial, (p + 4) with P = .50, is used (N is for the pair differences 
having a sign; it is the sample size less the number of zero differences) 
in the manner discussed earlier (pp. 46-48). For effective N larger than 
10 we may use either the normal curve approximation to the binomial 
(рр. 44-46) or the 7? approximation (pp. 209-12). Whether we use the 
binomial itself or one of the approximations, we must take care to secure 
a P that represents whichever—a one-tailed or a two-tailed—test is 
appropriate for the hypothesis being tested. 


The “median” test. A procedure for testing the difference between 


not exceed the median. 


Median test for more than two independent groups. This is a straight- 


forward extension of the median test to provide an over-all test of 
the differences between, say, C inde 


of the median of the distribution of the C 
are dichotomized (as near th 


2 by C table from which one may obtain a 
freedom. 


Whether we are dealing with two groups or with C Broups, the Ns for the 
Broups need not be equal for use of the median test, 
Mean and variance for rank Scores. 


Since the next four tests are 
based on ranks, we now digress to consid. 


€r the mean and variance of the 
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distribution of ranks, a rectangular distribution running from 1 to N when 
N persons have been ranked. Let X stand for any rank score. A little 
thought leads to the conclusion that X = (N + 1)/2, hence EX = NX 
= N(N + 1)/2 as the sum of the rank scores. In some college algebra 
textbooks there is proof that the sum of the squares of the first N natural 
numbers is given by N(N + 1)(2N + 1)/6. This gives us the value of E X?. 
When the foregoing given values for ХХ and EX? are substituted into the 
general formula (3.6) for the sum of squares of deviations about the mean, 
we find after simplification that Xa? = (N? — N)/12, hence the variance, 


, Xe (N?—1) 


g” = 
N 12 


(19.1) 


It is no accident that 6, or 3 of 12, appears in the formula for rank- 
difference correlation, and that 12 appears in Sheppards’ correction to 5 
for the grouping error. Why in the latter? 

Mann-Whitney U test. This test, which is applicable only to results 
based on two independent groups, involves rank ordering the scores, for 
the two groups combined, from greatest (rank 1) to /east (for which the 
rank will be N = N, + М» unless there are ties for the bottom position). 
When ties occur, each person involved is assigned the average of the ranks 
that would be assigned in case the tied persons could be differentiated 
(see p. 203). Then the ranks so assigned are summed separately for each 
group. Let T, and T; represent these two sums. (As a check on the arith- 


metic, T, + T, should equal ATH ‚ the sum of the first N natural 


numbers.) E. 
When both М, and N, are 8 or greater, the statistic 


N 1 
U, = NN, + OD _ т, (19.2) 
is distributed normally about a chance expected value, or mean, given 
by N,N,/2, and with variance of N,N4(N, + № + 1)/12. We then have 


U, — N,No/2 


x 
ao Гмм + No + D 
12 


as a unit normal deviate by which the significance of U as a deviation from 
the null hypothesis expected value is determined. f as an alternate, we 
define U by replacing T, with T, and N, with N, (in the second term), we 
will have 0. Now U, and О will deviate to the same extent, but in 


opposite directions, from N,N,/2. A | 
When JU, is larger than N,N,/2, the direction of the difference between 
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the two sets of scores is such that Broup 1 is superior to group 2. (If ranks 
are assigned with the least score as rank 1, and so on, the value of U, will 
be smaller than N,N,/2 when group 1 is superior.) For М, and N, less than 
8, special tables are required for judging the significance of U. 
Kruskal-Wallis one-way analysis of variance by ranks. This test is 
applicable for testing the difference between G independent groups, with 
varying numbers, m,, of cases per group. All N (N = Xm,) scores are 
ranked with a rank of 1 assigned to the lowest score and a rank of N to 
the highest, but the Broup identity of the cases is maintained so that we 
can sum the rank scores within each group, which sum we will designate as 
T, for the gth group. Then the quantity 


His 12 ж Т. «wg (19.3) 
MN +1) m, | 


is computed. Under null conditions (no difference in averages for the 
populations), and for all m, greater than 5, the sampling distribution of 
H follows closely the 7? distribution with df = G — 1. 

When there are sets of ties, with 1, the number of cases tied in the 
sth set, it is necessary to apply a correction to H: 


H, = H 
1= X(P, — 1) 


c 


The corrected value will be higher than the uncorrected value and will 


therefore tend to help us reject the null hypothesis. If H is significant, 
we would not bother to compute H,. 

Friedman two-way 
mixed model situati 


а 12 z 
"кас тт ZT*, — 3R(C + 1) (19.4) 


the chi Square distribution, with df=C—1. 


[19] DISTRIBUTION-FREE METHODS 379 


(C + 1)/2, thus “taking out" row differences. (Recall that for the ordinary 
two-way F test a row sum of squares was extracted.) Under null conditions 
that the original X scores all belong to the same population, i.e., that there 
is no column effect, we would expect the rank scores, RC in number, to be 
distributed evenly over the columns in such a way that the distributions 
within columns would be the same. Therefore, under null conditions the 
column means for the rank scores would tend to be the same; (that is, 
we would expect equal 7;) and the within column variances would also 
tend to be the same. 

Next we note that with all the row means equal to (C + 1)/2, the mean 
of means, or total mean, would also be (C + 1)/2. When we consider the 
general expression for the sum of squares for the R x C interaction, 
XX. — Х„— Xe + ¥.)%, we see that when Х,. = X., this sum of 
— X. which is nothing more than the sum of 


squares within columns. But under null conditions the distribution of 
ranks within any one column will, of course, be rectangular (the rank 
scores will run from 1 to C within a column) with variance that can be 
specified theoretically as (С° — 1)/12. Strictly speaking the theoretically 
specified, under null conditions, distribution of ranks within all columns 
cannot be exactly the same unless R is a multiple of C. 

The y®, of the Friedman test is an Е = s?[o?,,, in which s?, has C — 1 
degrees of freedom and s?,, has been replaced by the theoretical population 
if = co. As usual, when 7 = oo, F becomes 


variance, or a variance with a 
а x*[n,, hence y°, is n; times an F. Ties of ranks within rows do not disturb 


the Friedman test, and it is claimed that the Friedman test agrees very 
closely with an F test applied to the original X scores. 

Kendall's coefficient of concordance, W. Suppose C judges each rank 
order R individuals, and we wish a measure of the agreement among the 
judges. Arrange the rankings into a table of R rows and C columns. The 
ill run from 1 to R (except when ties occur 


rank scores in the cth column wi і 
for either the top or bottom position—unlikely in practice). Sum across 
hand margin. We might 


columns and enter the several sums along the right- 
regard these sums as "scores" for the R individuals. If there were perfect 


agreement among the judges, these sums would range from C to RC, with 
in between values of 2C, 3С, +++, (R — 1)C, though not necessarily in that 
order. Consider the variance of these sum scores. Since each sum is 
simply C times a rank, with the ranks running from 1 to A, the variance of 
these sum scores will be C? times the variance of the first R natural 
numbers. The maximum variance possible will be 


2 СК? — 1) 
O maz = р. 


squares reduces to EX. 
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The same value can be obtained by considering the variance of the 
sums in terms of the variance theorem. The variance of the sum scores 
will be equal to the sum of the separate C variances, one for each column 
and all identical, i.e., all will equal (R? — 1)/12, plus a sum of C(C — 1)/2 
correlational terms of the form 2r,,¢,¢9. Under the condition of perfect 
agreement among the judges, all of these correlational terms will become 
200,0, but since all columns have the same variance, оу will equal оь 
and each correlational term will become 2(1)o? in which о? = (R? — 1)/12. 
Summing C(C — 1)/2 such terms yields C(C — 1)6?, which when added to 
the sum of the C variances (i.e., added to Со?) gives 


Par = Co? + C(C — 1)0° = Cte? = CR? — 112 


In practice, perfect agreement will rarely if ever occur. As a measure 
of the extent of agreement, Kendall proposed that the variance of the 
obtained sums be taken relative to the maximum possible variance. 
Accordingly, the coefficient of concordance is defined as 


Zour (19.5) 


f. e Б [RET = (zz) ] 


with 7, as the total (or sum) for the rth row. Obvi 
negative; it will be 1 when the agreement is perfect. The value of W tends 
to be higher than the average of all possible Spearman rank difference 
correlations between the judges. When ties occur, the denomin 


for W becomes o? 


in which 


ously, W can never be 


ator term 
C е : 
mar — тов?" — t) in which г,, the number of cases 


tied in the sth set of ues, will take on values 2,3,4-- 


* , and all the sets 
irrespective of their column location are included. 


For R > 7, W may be tested for significance by 
2 I2RS sum 
‚= 19.6) 
oe CR(R + 1) : 


? distribution with R — 1 degrees of 
freedom. A significant W may be interpreted in two ways: either as 


indicative of better than chance agreement among the C judges or as a 
significant (reliable) ditference among the R sums (or the R possible means) 
for the R individuals. But mere statistical significance may not be as 
crucial as the knowledge that W is fairly high. 

There is a direct connection between the Friedman test and the signifi- 
cance test for Kendall’s W, although as significance tests they differ as to 


which follows approximately the у 
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purpose. Friedman's test is concerned with the significance of the differ- 
ences among the C column averages, the number of which is usually 
very small, whereas the test of W is concerned with the significance among 
R row averages, the number of which is usually not very small. Friedman's 
7, is, typically, used to test for the effect of a fixed constants factor, 
whereas typically у? tests for the significance of a random factor (individ- 
ual differences) although applicable to the ranking of C objects hy judges. 
Kendall's W provides a useful descriptive measure of agreement among 
judges, but such a measure is not a relevant part of the Friedman technique. 

It can be shown by simple algebra that the test for W can be written in 


the alternate form: 
2 12 РЕТ 
cm Т 3608-4 6] (19.7) 
ew СКЕ +1)" 


which bears a marked resemblance to the expression for z?» Simply 
transpose the roles assigned to the rows and columns and also interchange 


the R and C designations in z^ and you have д. 


Chapter 20 


REMARKS ON ERROR 
REDUCTION 


standing of them i 
chapters. 

For our present Purpose, we shall subsume errors under three headings: 
measurement or observational errors, errors in inferring population 
parameters in field or Survey studies, and errors in experimental testing 
of hypotheses. About the first of these, we remark only that errors of 


measurement can be reduced by developing more reliable tests or (when 
feasible) by averaging repeated measurements. 


FIELD STUDIES (SURVEYS) 


limit ourselves to three sampling methods: 
Random sampling. The conditions of г; 
specified earlier (p. 51), By the method of 
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of a certain grade in a city, we can secure it by a purely mechanical scheme, 
such as taking every nth card from the files. Although this type of syste- 
matic sampling does not exactly satisfy the conditions of random sampling, 
it will assure a random sample unless the cards have been systematically 
arranged (in a somewhat peculiar order). The use of the random method 
for sampling an uncatalogued population involves so many difficulties in 
psychological research that no schemes are to be found in the literature. 

Increasing sample size is the only way by which we can reduce chance 
errors when the random method is being employed. That sheer sample 
size is not enough to reduce nonrandom errors is evidenced by the 
Literary Digest straw polls, which rested on the assumption that the 
population of telephone subscribers and car owners was not different in 
its voting preference from the entire population of potential voters. This 
happened to hold before 1936, so that replies to ballots mailed at random 
to telephone subscribers and car owners forecasted fairly accurately the 
election results. Despite a very large sample, the Digest poll failed 
miserably in 1936; this failure is attributed to the alignment of voting to 
income levels, an alignment that did not exist in prior years. 

Stratified sampling. In the stratified method, one or more individuals 
are pulled at random from each of several strata, the number in the sample 
from each stratum being proportional to the universe number in the 
stratum, and the strata are predetermined by knowledge of some control 
variable or variables. Psychologists who sample so as to secure proportion- 
ate representation from the several occupational levels are, in effect, using 
the principle of stratification. It should be obvious that the method can be 
used only when information is available on some variable or variables 
which permits their use in setting up the strata, and when cases within the 
strata can be drawn randomly. е 

When the sampling is for attributes by the stratified method, the standard 
error of an obtained proportion, P, is given, in terms of information yielded 


by the sample, approximately by 


Sp= [2-5 (20.1) 


ortion in the total sample, N, who possess the 
attribute, О = 1 — Р, and S?, is the weighted variance of the several 
Strata proportions about the sample value P. A casual examination of 
formula (20.1) shows that the magnitude of the error is less for a stratified 
sample than for a random sample, and that the increase in precision 
depends on our ability to stratify the universe in such a way as to secure 
strata which are really different with regard to the attribute being studied. 


where P equals the prop 
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For stratified sampling, the variance of the mean may be written as 
2 T ргө 2 
Sy = Т (5° — 5°) (20.2) 


where X = the sample mean, S? = the sample variance, and 5° р, = the 
weighted variance of the means of the several strata about the total sample 
mean. If stratification has been accomplished by use of a variable, Y, 


which is linearly related to the variable being studied, the formula can be 
written in the form 


(5°, — St V) (20.3) 


It will be noticed that stratified sampling does lead to greater precision in 
the sense of smaller chance error, but only when the control or stratifying 
variable is related to the variable being studied. 

The quota method involves the use of strata, but selection within the 
strata is not done on a random basis—the field worker merely fills a quota 


by securing the correct Proportion per strata; selective factors leading to 
bias can easily operate. 


Area sampling. There is evidence that 
is the best method yet devised for drawing 
use, however, depends on the availability of extensive facilities. The 


student who is interested in this, or the stratified, method will wish to turn 
to detailed treatments of the subject. * 


area or “pin point" sampling 
samples in survey studies. Its 


SAMPLING ERRORS IN EXPERIMENTATION 


The formation of groups for experimental 


groups, (2) by pairing, (3) by using sibs or litter mates, (4) by matching 


by using the same person under all the experimental 


conditions. The last mentioned will not be feasible when practice or 
fatigue effects are likely, 


Scores on the same persons are large. The 
* Yates, F., Sampling methods 


1 for censuses and Surveys, New York: Hafner, 1949; 
Deming, W. E., Some theory of sampling, New York: Wiley, 1950. 
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foregoing argument holds, of course, for just two experimental groups (or 
an experimental and a control group) as well as for three or more groups. 
Thus, compared to method 1 (random assignment), greater precision is 
attainable by using method 2 or 3 or 5. Before discussing method 4, let us 
again consider the situation where groups are needed for just two condi- 
tions. If the groups are formed by pairing individuals, the sampling 
variance of the difference between the two means is, as we learned in 

Chapter 6, given by 
Sip = Sty, + OS?g, — 2гәб5у,5у, (6.8) 


The gain in pairing, over random assignment, depends on the magnitude 
of rg. It can be shown that if the pairing is done on the basis of variable Y, 
the value of rj; will be r,,, and in case two or more variables are controlled 
by pairing, ry, will be the square of the multiple correlation between the 
dependent variable, X, and the control variables. The reason for pairing, 
it will be recalled, is to make the groups comparable on certain variables 
which might affect the outcome of the experiment. We now see explicitly 
thattheadvantage of pairing depends definitely on how highly the variables, 
so controlled, are correlated with the dependent variable. No correlation, 
no gain; low correlation, little gain. 

Method 4 is another way of making groups comparable on pertinent 
variables. Instead of pairing persons, distributions are matched for the Y 
variable, to be controlled, in such a manner that the two groups contain 
the same proportions of cases in the several intervals as hold for a supply 
distribution on Y. The sampling variance of the difference between the 


two X means is given by 
$25 = 5%у(1— ray) + S*g (1 — 7*4) (20.4) 


made on the basis of several control variables, 


If the matching has been 
r each group) become the multiple rs between 


the two correlations (one fo 


X and the control variables. | 
From (20.4) we may deduce the following fact. Where two groups have 


been separately matched as to distribution on the same control variable(s), 
the standard error of the difference can be obtained without the restriction 
of the ordinary pairing procedure, which requires that there be an equal 
number of cases in the two groups. The reader will note that either term 
in formula (20.4) is, as might be expected, identical to formula (20.3) for 
the sampling variance of a mean when the stratified method is used. The 
method of matching distributions is particularly useful when the cost per 
Case is much greater in the experimental group than in the control group. 
Precision can be increased by taking à larger control group—a possibility 
also when the groups are chosen by randomization. 
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The use of paired individuals for experimental (and control) conditions 
has long been recognized as a sound procedure. We might argue, however, 


faith in the randomization Process. Random differences between groups 


the error formulas 
always include all random effects. When pairing leads to only a slight 


€ that the pairing procedure may not 


essential. (It is an interesting commentary 
"control" that Very frequently a control grou 
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supervised random, or a definite pairing, procedure would have avoided 
this selective bias, but what is more important and more relevant to our 
present topic is the claim in a paper} by “Student,” so far not refuted, that 
the use of 50 pairs of identical twins would have yielded as precise informa- 
tion at only 2 per cent of the cost of the original experiment, or at a saving 
of approximately 35,000 prewar dollars. 


T "Student," The Lanarkshire milk experiment, Biometrika, 1931, 23, 398-406. 


EXERCISES AND QUESTIONS* 


CHAPTER 2 
2.1. a. Make separate frequency distributions for the marks of the two groups 
of students in Table I. Use intervals of size 5. 
b. Determine also the cumulative frequencies for each Broup. 


Table I. Final examination marks for a class in statistics 


Students with No Calculus 


Students with Some Calculus 
(N = 36) 


(N = 22) 


103 150 139 79 150 134 137 
98 79 94 137 IB 113 
106 93 106 137 91 109 
71 101 92 74 106 87 
108 113 103 108 114 105 
120 95 83 93 109 97 


139 112 139 
151 124 80 153 
131 94 96 77 
133 123 101 115 
115 90 154 122 
111 135 


2.2. a. Make separate frequency distributions for the two Broups of scores in 
Table П. Use intervals of size 3. 


. Drawa frequency polygon for the distribution of T. 
. Draw an ogive for the data of Table 


b. Determine also the cumulative frequencies for each group, 

23. a. Draw a frequency polygon for the distribution in Table III, part A. 
b. Draw an ogive for the data of Table III, part A, 

24. a 
b. 


able III, part B. 
Ill, part B, 


* These are so arranged that frequently an even-numbered exercise is of the same type 
as its immediately Preceding odd-numbered exercise. “Thought” questions are intended 
as thinking, not thumbing, exercises, 


388 


EXERCISES AND QUESTIONS 389 


Table II. Scores on final examination for a course on psychological tests 


Undergraduates (N = 32) Graduate Students (N = 23) 
70 72 76 66 76 80 84 80 90 82 84 
67 69 90 50 76 47 79 62 77 89 70 
51 58 71 88 65 54 73 74 87 76 79 
89 64 80 67 71 90 85 95 78 69 97 
91 71 63 81 87 81 78 86 92 
79 79 

СНАРТЕК 3 


3.1. For the scores of Table I, compute separately for the two groups: 

a. the medians, using the undistributed scores. 

b. the medians, using the frequency distributions. 
3.2. Repeat exercise 3.1 with the data of Table П. 
3.3. Compute the mean for each group in Table I by 

a. the definition formula for the mean. 

b. the arbitrary origin method. 
3.4. Repeat exercise 3.3 with the data of Table П. 
3.5. Combine the two distributions for the data of Table I, compute the mean 
by the arbitrary origin method, and check by using the formula for securing the 
mean for a combined group (use the means obtained by the arbitrary origin 
method for this check). 
3.6. Repeat exercise 3.5 wit 
3.7. Compute the median, Q,, and Оз 
3.8. Compute the median, the 20th 
distribution in Table III, part В. 
3.9. Using the results of exercise 3.7, locate the three points, О, the median, 
and Qs, on the base line of your agive curve for the distribution of Table Ш, 
part A. Divide the ordinate on the right-hand side (the ordinate at IQ = 170) 
into approximate fourths. Draw a line from each of the three base-line points up 
to the ogive, then horizontally to the right. Notice where these horizontal lines 
hit the ordinate on the right-hand side. 
3.10. Using the results of exercise 3.8, locate the three points, the median, Ps, 
and P, on the base line of your ogive curve for the distribution of Table Ш, 
part B. Divide the ordinate on the right-hand side (the ordinate at IO = 180) 
into approximate fifths. Draw a line from each of the three base-line points up to 
the ogive, then horizontally to the right. Note where these horizontals hit the 


ordinate on the right-hand side. 


3.11. Compute the standard deviations 


arbitrary origin method). 
e 3.11 with the data of Table II. 


h the data of Table II. 
for the distribution in Table Ш, part A. 
and the 80th percentile points for the 


for the two groups in Table I (use 


3.12. Repeat exercis 
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Table III. Distribution of IQs, form L of 1937 Stanford-Binet scale 


A. Ages 21-51 B. Ages 6-13 
IQ f cuf f cuf 
170-179 1 1623 
160-169 4 728 1 1622 
150-159 4 724 3 1621 
140-149 11 720 29 1618 
130-139 41 709 73 1589 
120-129 82 668 140 1516 
110-119 175 586 308 1376 
100-109 193 411 407 1068 
90-99 107 218 335 661 
80-89 76 111 215 326 
70—79 20 35 76 111 
60-69 7 15 30 35. 
50-59 5 8 4 5 
40-49 2 а 1 1 
30-39 1 1 
М = 728 N = 1623 


ee 1. O 


3.13. For the distribution of 
the standard deviation is 17.41. 
4. Determine the two points defined by M + S. 
b. Determine the two points defined by M +25. 
c. Locate these four points, also the me 


IOs in Table III, part A, the mean is 106.68 and 


3.14. The distribution of IQs in Table III 


of 16.88. Repeat exercise 3.13; using thi 
Table III, part B. 


» Part B, has a mean of 103.34 and an S 
€ values and polygon for the data of 


3.15. Suppose the mean score on a statistics 


quiz is 35, the median is 36, the S is 
6, and the quartile deviation is 4. 


new mean and $? 


3.16. Given that the distribution 


Of scores on a quiz leads to a mean of 40, a 
median of 38, an S of 9, anda qu 


artile deviation of 6. 
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a. If we added 10 points to the scores of each student, what would be 
the values for M, Mdn, S, and Q? 
b. If all scores were halved, what would be the values of the mean and 
the 5? 
3.17. If you were told that the mean final score for the 50 students was 80 and 
the mean for the 30 men in the class was 82.3, what would you figure as the mean 
for the women? 
3.18. Given that the mean weekly pay of the 7 working members of the Jones 
family is $55 and the median is $50 (both after deductions). 
a. What is the weekly "take home” of the family? 
b. Suppose that Daddy Jones, already the best paid, receives an increase 
which after deductions amounts to $6 а week. What is the new mean? 
What is the new median? 
3.19. Ifan S is 9 when computed from a frequency distribution with intervals 
of size 6, what would you expect it to be if computed by using the definition 
formula for 5? 
3.20. How large is the grouping error in an S of 13 computed from a distri- 
bution with intervals of size 12? 
3.21. Why would we usually expect the di 
percentile points to exceed the difference be 
points? 
3.22. Suppose that A knows only that the Q of a distribution is 20, whereas B 
knows that the 75th percentile is 30 units from the median and the 25th percentile 
is 10 units from the median. What can B tell about the distribution that А 


cannot? 


fference between the 10th and 20th 
tween the 40th and 50th percentile 


CHAPTER 4 


4.1. Assume that the IQs for a large number of unselected elementary school 
are distributed as a normal curve with a mean of 100 and an S of 17. 
point will be near what value? 

h IQs above 130 will be? 

t will fall between what values? 


children 
. The first quartile 
. The percentage wit 


. The middle 80 per cen 
d. The 99th percentile will be near what IQ value? 
e 


. The percentage with IQs below 70 will be? 

he Army General Classification Test yields a normal 

mean of 100 and S of 20. 

a. The value of the third quartile will be near what score? 

b. The first percentile point will be at what score? 

c. Between a score of 70 and a score of 130 will be found what percentage 
of the cases? 

d. The middle 60 per cent of scores will fall between what score values? 

e. The value of the quartile deviation will be what? 

nd the meaning of either sizable or small differences 

der the extent to which the distributions overlap. 


^ 99 


42. Let us presume that t 
distribution of scores, with 


4.3. One way to comprehe 
between groups is to consi 
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Given the following data for weights of college students: 
Men: M —142, $ — 15; Women: M = 120,5 = 12 


Assuming normality for both distributions, how many men per thousand are 
lighter than the average woman? Determine the number of women per thousand 
who are heavier than the average man. 

4.4. Ifthe mean height for college men is 68.5 inches and the S is 2.8, and if the 
mean height for college women is 64.5 and the S is 2.5, what proportion of women 
exceed theaverageman in height? What proportion of men fall below the average 
height for women? 

4.5. Suppose that the distribution of numerical grades in a course is normal 
with a mean of 60 and an S of 10. The instructor wishes to assign letter grades as 
follows: 15 per cent As, 35 per cent Bs, 35 per cent Cs, and 15 per cent Ds. 


g line between the As and Bs, between 
the Bs and Cs, and between the Cs and Ds. 


4.6. Suppose that it has been decided to use a five-letter grading system, A, B, 
C, D, and E, and that it is required that the letters shall correspond to “equal” 
distances on the base line, the whole of which is taken to be 6 Ss. Assuming 
normality, what percentage would be assigned As; Bs; Cs? 

4.7. Determine the height of the unit 
units below the median; at the third q 


48. What is the height of the Ordinate of the unit normal curve corresponding 


to the /c value that cuts off the upper 10 per cent of the curve? The lower 25 
per cent? 


4.9. Frequently, we must be a 

Scores and vice versa (assume normality). 

a. What are the standard scores (to the nearest ten 
to the following percentiles: 44th, 99th? 

b. What are the percentile equivalents (to nearest у 
standard scores: —1.34, +2.06? 

4.10. Suppose a typical bell-shaped distribution, 
percentile value of the follow 
is one S above the mean 

4.11. What is the [a distance 
a. the 10th and the 90th 
b. the 25th and the 75th 


normal curve at the point which is 1.2 2 
uartile point. 


ble to translate percentile scores to standard 
th) which correspond 


alue) of the following 


What is the approximate 


ing points: the mean, Qs, the point which 
» and the first decile point? 


between the followin 
percentile points? 
percentile points? 


g (assume normality): 
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4.14. If a student's reading rate score falls at the 20th percentile, and his 
standard score on reading comprehension is —1.4, would you conclude that his 
comprehension was superior to his rate? Why? 
4.5. M + 2 quartile deviations will give two points within which, for normal 
distributions, one would expect about what per cent of the cases? 
4.16. For Test A the scores are transformed to Z scores with M = 50 and S 
= 10 and for Test B the scores are transformed to T scores with M = 50, 
S — 10. Why may a score of 60 on Test A be not comparable to a score of 60 
on Test B? 
4.17. Suppose we have à distribution with skewness, у = .60. If we trans- 
formed the scores into standard scores; also to Tscores; and also to percentiles; 
what can you say regarding the shape of the distribution of 

a. the standard scores? 

b. the T scores? 

c. the percentile scores? 
4.18. Given that the distributions for two groups. with Ns equal, are normal 
in form. Now consider the distribution for the two groups combined. Under 
what condition would you expect the shape of the combined distribution to be: 


Platykurtic? Leptokurtic? Normal? 


CHAPTER 5 


5.1. If you tossed 4 unbiased pennies 160 times, how often would you expect 


to have 2 heads and 2 tails? 
5.2. Suppose you roll a pair o 
exactly 11 spots will turn up? 
5.3. Suppose that you are rolling 2 fair dice, 
What is the probability of obtaining a 3 spot on t 
white one? 

54. In that back-alley game kno 


f fair dice once. What is the probability that 


one red and the other white. 
he red die and a 4 spot on the 


wn as “crap shooting," the obtaining of spots 
on the 2 dice totaling 7 seems to be of paramount importance at certain times. 
What is the probability of rolling a 7 (assume gentlemen's dice)? 

5.5. Suppose that we have 3 pyramidal objects (perfectly homogeneous) which 
can be rolled like dice. The sides of each are numbered 1, 2, 3, 4; and success 
is defined as the getting of 4s on the down sides. Determine the рыу for 
obtaining exactly three 4s; exactly two 4s; exactly one 4; and no 4s. What is 
the probability of securing at least two 4s? 

5.6. If you were dealt 1 card from each of 


probability of all 5 cards being spades? | 
5.7. The probability of drawing a red card from an ordinary (and well-shuffled) 


deck is } and the probability of drawing a heart is 1. wiy isn't 2 plus } the 
probability of drawing either a heart or à red card, or is ie | 

5.8. Suppose that for a class of 100 the number of As given on the first quiz 
is 15 and that the number of As on the second quiz is also 15. Suppose further 


5 well-shuffled decks, what is the 
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that the names of the students are placed on slips which are then well mixed 
ina hat. We might say that the probability is .15 that a name drawn from the 
hat will be that of a student who received an A on the first quiz; likewise, 
the second quiz. Why might it be erroneous to say that the probability is .15 
times .15 that the drawn name belongs to a student who made As on both 
quizzes? 

5.9. A student takes a four-alternative, 12 question multiple-choice test. If he 
merely guesses, what is the probability that he will get all 12 questions correct ? 
5.10. The typical ESP deck consists of 25 cards, with 5 cards for each of 5 
symbols. The person who claims extra-sensory perception (ESP) ability attempts 
to name the symbol on the cards as they are exposed (one at a time, after shuff- 
ling) by the experimenter in a room remote from the person. The "score" is the 
number of correct calls in a run through the pack. 


a. What are the numerical values for р, 9, and п for the binomial distri- 
bution? 


b. Would “scores” of 3 and 7 be equally likely on a chance basis? Why? 


5.11. a. Toss 6 coins 64 times; for each toss tally the number of heads that 
turn up, thereby obtaining a frequency distribution with an N of 64. 
Label this Series A. Toss the coins 64 more times, and label the result- 
ing distribution as Series B. Then combine the two distributions. 

b. Using the binomial expansion, ascertain the expected distribution 
when 6 coins are tossed 64 times; 128 times. 

с. Compute the mean and standard d. 
distributions; also for the expected d 

d. Determine the Proportion of times t 
up in each series, and in the combi 
with the expected proportions. 

€. Subtract the mean of Series 4 from that of Series B (keep sign if 
negative). For the proportion of times 3 heads turned up, subtract 
the Series 4 proportion from that for Series B (keep sign). 

f. Bring all the results to class $0 that frequency distributions may be 
made for Ms, Ss, proportions, and differences between Ms and between 
proportions. 

5.12. Do exercise 5.11, using 7 coins. 
5.13. If 42 of 60 rats turn to the ri 


eviation for each of your three 
istribution (round to 2 decimals). 
hat 3 heads, also 6 heads, turned 
ned series. Compare these results 


£ht at the first choice point in a maze, would 
» prefer to turn to the right at this choice point? 
cent of all eligible voters favor the Democrats, 
ә ndom samples of size 400 yield percentages of 
55 or over as favoring Democrats? 

5.15. Items on an intelligence test of the Binet type are at times assigned an 
index of difficulty which is nothing more than the percentage passing the item. 
Given the following for an item: of 100 12-year-olds, 60 per cent passed; of 
100 13-year-olds, 80 per cent passed. When possible sampling errors are con- 
sidered, would you conclude from these two difficulty indices that the item is 
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really more difficult for 12-year-olds? State the significance level associated with 
your conclusion. 
5.16. Ifa political issue is favored by 55 per cent of a sample of 200 Republicans, 
and by 46 per cent of a sample of 250 Democrats, would you conclude that the 
populations of Republicans and Democrats differ on the issue? 
5.17. a. Given the data in Table IV, do items a and b differ significantly in diffi- 
culty for the 4-year-olds ? Ditto, the 5-year-olds ? 
b. Is there a significant difference between 4- and 5-year-olds on item a? 
On item 5? 


Table IV. Data for passing (P) and failing (F) items on the Stanford-Binet Test 


4-year-olds 5-year-olds 
Item Item Item Item 
Case a b Case a b Case a b Case a b 
1 Е Р РЕ Aa P P 61 P P 
2 P F 22 P F 42 P P 62 P P 
3 Р В 33 P F 43 PF 63 РЕ 
4 F P 24 P P 44 FF 64 P P 
5 P F 25 РР 45 PP 65 F F 
6 F F 26 P P 46 P P 66 P P 
ТЕР 27 P Е 47 Е Б 67 ЕЕ 
& РЕ 238 Е Е 48 P P 68 P P 
9 PF 29 P P 49 P P 69 P P 
10 ЕЕ 30 РЕ so P P 70 ЕЕ 
ШР Р Sp m В 51 РР ПЕР 
12 P P 3 PF a РР 72 F F 
13 F F 33 P E 53 PP 73 P E 
14 P P 34 F F 54 P P 74 P F 
15 P F 35 Р P 5 ЕР 75 FF 
16 P F 36 P F 56 P F 76 Е Е 
17 Р F a7 p P 5] P P 77 Р Е 
18 F F 38 F F 58 P P 78 P E 
I9: РЕ 39 F F 59 РЕ 79 F P 
20 P P 40 PF 60 P F 80 P F 


the data of Table V that items c and d differ 
significantly in difficulty for the 6-year-olds? Ditto, the 7-year-olds ? 

b. Would you conclude from the data of Table V that, in general, 1-year- 
olds are more successful than 6-year-olds on item c? On item d? 


5.18. a. Would you conclude from 
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Table V. Passing (P) and failing (F) information on two Binet Test items at two 


age levels 
6-year-olds 7-year-olds 
Item Item Item Item 
Case c d Case c d Case c d Case v 4 
1 F F 21] P P 4| P F 61 P F 
2 PP 22 P F 42 P P 62 F F 
$3 FP 23 F F 43 F P 63 P F 
4 FF 24 F F 44 P P 64 Е Е 
3 B B 25 F F 45 F P 65 P P 
6 FF 26 P P 46 FF 66 P P 
ТРЕ 21 В F 47 P P 67 P P 
$ FE 28 P F 48 F P 68 РР 
ЭРЕ 29. F F 49 P P 69 P P 
10 F F 30 F F 50 P F 70 P P 
ILE Е 31 F F 1 ЕЕ 71 P P 
12 P P 32 F F 52 F F 72 P P 
13 F F 33 P F 33 F F 73 P F 
14 P F 34 P F 54 P P 74 P F 
15 F F 35 F F 55 F F 75 F F 
16 P P 36 F F 56 P P 76 P P 
17 ЕЕ 3 FP 57 Е Е 7 PE 
18 ЕЕ 38 P F 58 P P 78 P F 
19 ЕР 39 F F 59 PP 79 P F 
20 F F 40 P P 60 F F 80 P P 


5.19. In a student presidential election, Mr. Ralph received 2389, or 60 per 
cent, of the votes cast. Suppose that you had been able to poll a sample of 100 
the day before the election. Assuming that such “Jast day” changes as took 


5.20 A sample of N = 


im yeses to a question. Under 
what condition would you expect a large number of successive random samples 
to yield percentages of yeses that would average 55? 


5.21. Suppose a situation involvin 
for a two-tailed test of significanci 
our level of significance. 


£ the difference between groups, which calls 
* and that we have decided on P — .01 as 
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a. What is the probability of committing a type I error if the null hypo- 
thesis is really true? 
b. If the null hypothesis is not true, what is the probability of making a 
type I error? 
c. If the true difference were 3.3, what additional information would you 
need in order to figure out the probability of making a type II error? 
5.22. Let beta stand for the probability of correctly rejecting the null hypothesis 
and let one minus beta stand for the probability of making the type II error. 
Under what condition could these two probabilities be equal? 
5.23. For a sample of 100 it is found that 60 say yes and 40 say no when asked a 
certain question. For the difference, .60 — .40, between the two proportions 
why would it be incorrect to take the square root of p,q./100 + р‹д‹[100 as the 
standard error of the difference? 
5.24. Consider the setup for testing the difference between two nonindependent, 
or related, proportions via the square root of (a + d)/N. Although we have not 
explained the concept of correlation (or association), do you see a basis for 
saying that (a + d)/N tends to be smaller the higher the correlation between 
the two sets of responses? 
5.25. Suppose percentages of 37 and 39 are found for two samples, each of size 
100, drawn from a defined population. Since a difference as large as 2 percentage 
points can easily, for the given Ns, arise on a chance basis, it would seem safe to 
conclude that the samples are in very ‘close agreement. From this degree of 
similarity in results, would you conclude that the sampling method has avoided 
bias? Explain or defend your answer. 
5.26. Some textbooks have argued that whether or not a sample is representa- 
tive (i.e., not biased) can be judged by splitting it (the sample) into random halves, 
and then claim representativeness if the means for the two halves are not 


significantly different. Any comment? 


CHAPTER 6 


6.1. For a sample of 2970 cases, ages 2.5 to 18, the distribution of IQs on 


Form L of the 1937 Stanford-Binet yields: 
Mean = 104.00 Skewness (g1) = .028 
5 = 17.03 Kurtosis (gə) = -346 


In answering the following questions, indicate the steps in your computations. 
a. Would you conclude that the mean IQ of the population for these ages 

is 100 (the value expected for a properly constructed IQ test)? і 
b. Is it reasonable to believe that the IQ distribution for the population, 


at these ages, has normal skewness ? . б 
c. Would you conclude from the sample kurtosis that the kurtosis for 


the population differs from normal kurtosis ? 


398 PSYCHOLOGICAL STATISTICS 


6.2. Suppose that the mean IQ for the general population is 100 and the stand- 
ard deviation is 17. If a sample of 289 cases were drawn at random, what would 
be the probability of obtaining a mean as great as 101? As low as 98? 


6.3. Suppose it is known that the standard deviation of scores for a population 


is20. How many cases would you need to draw in order that the standard 
error of 


а. à sample mean be 2 score points? 
b. a sample S be 3 points? 


6.4. Suppose that you are polling on an issue for which opinion seems about 
equally divided. How many cases (how large an N) would you need to be sure 


(at the .01 level of significance) that a sample deviation of 3 per cent from 50 per 
cent is nonchance? 


6.5. One of the requirements of a good IQ test is that the mean IQ for un- 
selected cases of any school age group shall be 100, and that the distributions for 


the several age groups shall have the same standard deviations. Given the 
following for the 1937 Stanford-Binet Test: 


Age 6 12 
N 203 202 
M 101.0 103.6 
$ 12.5 20.0 


—————————— 


4. Is it reasonable to believe that t 
used with 12-year-olds? 
b. Would you judge from the results for these two age groups that the 
requirement of equal variability has been met? 
6.6. The means and standa 
packing are as follows: 


he test is yielding the desired mean when 


rd deviations for two Broups of twins on spool 


Fraternals Identicals 


N 92 94 
M 761 741 
79 66 


given to a group. For 202 cases of age 7, we 
net: 


Form L Form M 


M 101.8 103.5 
S 16.2 15.6 
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In order to balance practice effect, one-half the group was tested on Form L, 
then on Form M, whereas the reverse order was used for the other half. The 
correlation between the two sets of IQs was .93. Is the obtained difference 
between means larger than one would expect on the basis of chance sampling? 
Ditto, the difference between the Ss? 
6.8. Measurements on 1000 of each sex at birth have been reported in the 
literature. The mean length of boys (in centimeters) was 50.51 and the S was 
2.99, and the values for the girls were 49.90 and 3.00. Is there evidence here for 
sex difference in length at birth? 
6.9. Given a two-tailed test of the hypothesis that no change has taken place 
and that the standard error of the mean change is 3 points and that we use the 
.05 level for judging significance. 
a. If the true change is a loss of 6 points, the probability (beta) of correctly 
rejecting the null hypothesis is approximately whereas if the 
true change is a gain of 12 points, the value of beta is approximately 


. If the true change is zero, the value of beta is 
c. If, instead of a two-tailed test, a one-tailed test were used, the prob- 
ability of making the type I error would be 
6.10. Given that a two-tailed test is appropriate, that we have chosen the .05 
level of significance, that the obtained difference between two means is 3, and 


that the standard error of the difference is 2. 

a. Would we reject the null hypothesis? 

b. What is the probability that we will make a type I error? 

c. If the true or population difference were 6, what is the (approximate) 

probability that we would correctly reject the null hypothesis ? 
d. If the true difference were zero, what can you say about the likelihood 
of making a type II error? 

6.11. Given that a sample yields 98 per cent of yeses to a question and that the 
standard error of the percentage is 2. If we set the .99 confidence limits as 98 
+2.58(2) we arrive at the absurdity of an upper limi* in excess of 100 per cent. 
Why? 
6.12. Suppose you draw a sample of size 3 (yielding scores of 90, 99, and 102; 
mean — 97) from a population which you know to have a mean of 100. What 
would be your best single estimate of the population variance? 
6.13. Consider the following two statements: (a) the probability is .95 that 
sample means will not deviate more than 1.9655; from the population mean, and 
(b) the probability is .99 that the population mean will not deviate more than 
2.585, from a sample mean. Which statement is false? Why? 
6.14. “The true mean has а 95 per cent chance of falling in the 95 per cent 
confidence interval for the true mean." This statement is incorrect. Why? 


Restate it in correct fashion. 
6.15. If the standard deviation for a Very large (infinite) population equals that 
for a small (finite) population, samples of the same size from the two populations 
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will lead to confidence intervals for t 
different in width? Why? 


6.16. For normal distributions the s 


he population means that are the same or 


àmple mean and sample median are 
ation parameter). For fixed N. how will 


mean and also on the median? Why? 


6.17. We did not discuss the sampling instability of Percentiles. Do you think 
that for a distribution of 200 scores the 55th percentile Point will be more or less 
stable in the Sampling sense than the 95th р 


у assigns 12 Persons to ап experimental group 
trol group, thus assuring independence for the two 
at random to an experimental 
; 12 persons by matching (by pairs) 
Soup. Both experimenters 
P Means (their Own groups) via the ; test. 


7.4. Given that the unbiased estimate Of the st 
is 4, that the chosen level for judging Significance is .01, th 4 > 
appropriate, and that for the d =N-1 = 24-12 S iu eld test E 
reach 2.50 for claiming significance at the adopted :01 leyg value of ; mus 
a. Under these circumstances the Probability Of co 
error is what? 
ча. ed the Population need to be in 
E E °ггог be exactly .50? 
7.5. Given a mean of 50 based On 21 cases, with зм =2, А | 
confidence interval; also the .99 confidence limits, ' “Scertain the .95 
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7.6. When setting the .95 confidence interval for a population mean, the 
procedure for small samples differs from that for large samples in what two 
respects ? 

7.7. An experimenter knows that the с for population A is 4 and the с for 
population B is 3. He draws a sample of size 10 from population A and a 
sample of size 10 from population B. In testing the significance of the difference 
between the means of the two samples he uses large sample techniques. Is he 
justified in doing this? Why? 

7.8. An experimenter uses a sample of size 15. He wishes to test the hypothesis 
that the population mean is 100. However, he knows nothing of small sample 
techniques. If he uses large sample techniques for his test, will he increase or 
decrease his probability of making a type II error over what it would be if he 
used small sample techniques? Why? 


CHAPTERS 8 AND 9 
8-9.1. a. Using the data of Table VI, make a scatter diagram with "Ex" 
on the y axis, intervals of size 5; and with “TMT” on the x axis, 
with i = 3 and the first interval taken as 105-107 (interval sizes are 
suggested in order to facilitate an exact check of the tallying and 


subsequent computations). 
b. From the scatter diagram, compute the correlation between “Ex” 


and “TMT”; also compute the two means and the two standard 
deviations. 

Write the regression equation for predicting "Ex" from “TMT”. 
Draw the regression line on your scatter diagram. 

d. Determine the error of estimate for predicting "Ex" from a knowledge 


of "TMT". 
What percentage of the variance in “Ех” is due to or associated with 


variation іп “ТМТ”? 
8-9.2. Do exercises 8-9.1 with “СМ” substituted for “ТМТ” (an appropriate 
interval size for *CM" is rather obvious). 
9.3. The standard deviation of difference scores (D) based on z, and z, 
(D = 2, — z,) is.40. What is the correlation between X and Y? How did you 
get your answer? 


9.4. Consider the general formula for the variance of a difference S?,_,, = 5°, 
+ 5%, — 2r,,S,S,. Can you suggest a method for determining the r between two 
sets of correlated scores? 

9.5. Consider Y as weight and X as height an? that ғ has been computed and 
that B and A have been calculated for the regression equation Y' — BX 4- A. 
Now as regards the metric (or measurement units), Y is in pounds and X 
is in inches. What can you say about the units (or metric) for 4? For B? 
For r? 

9.6. Suppose we consider the regression lines, that for Y on X and that for X 
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Table VI. Data for 38 students in a course on mental tests (**Ex** stands for га 
examination Scores; “TMT” stands for IOs based on Terman-McNemar Tes | 
Mental Ability; “СМ?” stands for scores on the Terman Concept Mastery Te 


Ex TMT CM Ех TMT см Ех ТМТ СМ 
62 123 47 106 125 126 54 128 o 
107 129 59 79 109 33 86 132 ^ 
87 131 78 84 120 56 92 114 4 
95 129 734 100 129 81 67 113 з 
100. 122 52 78 112 51 102 141 112 
87 136 127 90 132 110 79 132 72 
87 125 7 85 126 54 82 126 54 
64 121 46 58 iñ 33 96 131 r^ 
89 131 97 110 — 138 ә 77 131 с 
58 128 7 115 131 138 75 109 2 
84 123 28 68 129 39 93 131 p 
80 127 53 78 123 67 67 106 25 


82 120 53 80 136 101 


on Y. Ifthe assumption of linearit 
will the two regression lines 
a. coincide? 
b. be at right angles to each other? 
- both have negative slopes? 


y holds for both lines, under what condition 


9:7; Occasionally we encounter a Statement which goes something like this: 
the relat 


Onship seems to be higher for the low scorers than the high scorers. 
What does this imply? 


9.10. Given: М» = 40, M, = 50, 5, = 8, S, = 6 and ғ 


7.00. What will 
the mean and the st 


andard deviation be for the sum Score, Y + y9 
9.11. In what wa: 


YS can X be manipulated without chan 
between X and y» 


ging the correlation 


9.12. Suppose that the Score on a first quiz is to be combined with the score 


© give equal weight to the two quizzes. Why might a 
simple addition of the two scores for an individual fail to accomplish the desired 
equal weighting? 
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9.13. We saw that r? was equal to the ratio of two variances, hence providing 
a percentage interpretation of correlation. Algebraically, by taking square roots 
we have ras the ratio of two Ss. Why cannot the latter ratio be safely interpreted 
in percentage terms? 
9.14. Numerically an r of .60 is twice an r of .30; under what circumstances, as 
regards interpretation, can we regard 

a. .60 as exactly four times 30? 

b. .60 as nearly four times .30? 

c. .60 as being twice .30? 
9.15. Test YhasanS = 10. The S of the predicted Ys from X (Sy) = 6. What 
is the standard error of estimate (Sy.z)? What is the correlation between X and 
Ya? 
9.16. A critic of the text has said that X(y — y’?/N does not qualify as a 
variance unless it is first demonstrated that X(y — У) = 0. Can you supply 
a very, very simple algebraic proof that X(y — y") does equal zero? 
9.17. Given that the correlation between X and Y for Group A is .60 and for 
Group B also .60. Under what specific conditions would the correlation between 
X and Y for the two groups combined be much higher, say, .80? Ditto, much 
lower? 
9.18. Although it is indubitably true that an r must reach .866 to reduce the 
error of estimate by 50 per cent of what it would be for an r of zero, do you 
see another way of looking at the situation which does not make the "picture so 
black"? Hint: Imagine the two scatter diagrams, one for an r of .866 between, 
say, W and Y, the other for an r of zero between W and X, with W being a 
variable to be predicted and Y and X being possible predictors. Now suppose 
a person 2 standard score units above average on both Y and X. 


CHAPTER 10 
10.1. An N of 101 will yield .10 as the standard error for near zero rs. If 
we have adopted the .05 level of significance and are using a two-tailed test: 
a. What is the probability of making a type I error if the population r is 
zero? . | | 
b. What is the probability of making a type I error if the population r is 
+.06? 
c. What is the probability of making a type II error if the population r is 
—.20? 
d. How often would we correctly reject the null hypothesis if the popula- 
tion r were +.20? 
10.2. For large samples the sampling distribution of low rs may be regarded 
as normal, with standard error of 1/V М —1. Suppose in what follows that M 
is 101, thus giving a c, of .10. 
a. If the null hypothesis of no correlation for the population being 
sampled is true, the probability of a sample r exceeding +.20 is 
approximately what? 
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b. When the ғ in the population is .10 and we are using a two-tailed 
test and the .05 level of significance, the probability of committing 
the type II error is approximately what? 

c. If a population r is 26 and We are using a one-tailed test and the .01 


level, the probability of correctly rejecting the null hypothesis is 
approximately what? 


d. If a sample r is -15, the .95 с 
approximately what? 
e. For part d, the probability that the limits so Set will not include the 
population value is what? 
10.3. The classical formula G — P) vw, 
implies that the degree of Sampling stability for an у 


an r = .30, yet by the z transformation the Standard 
sponding zs are the same. 


of an r of .30 and one of .90. 


onfidence limits for the population r are 


10.5. It was argued that the degrees of freedom, № 


d andard (sampling) error for the 
correlation based on 100 cases? 
c. If one variable were cur 
in its standard deviatio 
forr? 


tailed so as to lead to a 50 
n, what 


Per cent reduction 
would you expect 


as a sample value 


10.7. As regards the shape of their random sampling distributions proportions 
and correlation coefficients have what in common under co; ә 


M nditio, ify) 
and what in common under conditions B (specify)? ns A (specify 
10.8. For Group 1 the r between two variables is fou 


10.9. Suppose an r of .90 based on 16 cases, and Suppose that Wee 
P = .95 confidence limits first by use of the classical Standard егт, 
by way of the z transformation. Which method will 
bound? Why? (No arithmetic called for.) 


stablish the 
: or of r and then 
Yield the higher upper 
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10.10. Distinguish between S*,(1 — r°) апа S*,(1 — r). What subscripts are 
needed for the rs? 

10.11. Suppose you are to choose between two equally well standardized tests 
of reading comprehension to use in a school system. Test A has a reliability 
coefficient of .95 and an S, of 4, whereas Test B hasa reliability of .91 and an 5, 
of 3. Which would you choose and why? 

10.12. Test A yields an S of 12 and an S, of 3, whereas Test B yields an S of 
20 and an S, of 4. Which test is the more reliable? Why? 

10.13. It was argued that one result of measurement errors is a regressive effect. 
For example, those testing between 130 and 134 IQ points today will tend to test 
nearer average (100 points) tomorrow whereas those between 70 and 74 today 
will have regressed upward tomorrow. This would seem to imply that the group 
variability on tomorrow's testing will be less than the group variability today. 
Explain why this expected change in group variability does nor take place even for 
very unreliable tests. 

10.14. Suppose Form A and Form B are strictly parallel forms. This means 
that M, = Mp, Sa = S, (both = Sz), and are similar in content; hence г, the 
reliability coefficient, is simply га. We showed that a true score, given an 
obtained score, Xa, is best estimated by the equation z', = "а = Tava with an 
= S, Vraz — rhy. dt also follows from our 
discussion of r and regression equations that the best estimate of a score on 
Form B, given an obtained aq, is by 2 = "ava = lerta for which the standard 
error of estimate is Sy, = $4 V1 — T?a Obviously, the regression estimates of 
ж, and ху lead to identical results, but the error of estimate for the latter is always 
larger than that for the former, as can be seen by examining the two error 
formulas. How would you account for the apparent paradox that two estimates 
which lead to precisely the same value have differing degrees of precision? 
10.15. For X = X, + E we have S?, = 5%, + S?,. Now when we solve 
X = X, + E for X, we get X, = X — E, whence by the variance theorem we 
might write S?, = S2, + S?, which is inconsistent with S?, = 5°, + S?,. What is 


error of estimate given by Sua 


wrong? 

10.16. Suppose two forms of a test, A and B, and that the form vs. form 
reliability (ray) is .91 and Sa = S, = 20, thus leading to a standard error of 
measurement of 6. 

a. If we use scores obtained by summing the two form scores we will 
have scores with a reliability of how much? How did you arrive at 
this? 

b. Would the reliability coefficient|be different if you averaged the two 
form scores instead of merely summing them? Why? 

c. Would the averaged values have a larger or smaller standard error 
of measurement than that for the summed values? Why? 

10.17. It is sometimes said that intelligence can be held constant by choosing 
individuals (in forming a group) with the same IQ, or the same score, on an 
intelligence test. Any comment? 


10.18. What logical connection do you see between the reliability of gain scores 
(С = x, — ХӘ) and r,, corrected for attenuation? 


10.21. Itcan be Shown that the correlation, rj», between Xand Xp (= Xi X) 
i i n when rj, = zero. Under what 
+707 when ry» is zero? 


2 т 
SS AM 

м 60 80 50 190 
$ 10 12 15 25 


With only the foregoing information at your disposal, would you expect X, to 
correlate higher or lower than Xs with Xp? Why? 


i etween the four variables, U, y. X, and Y as 
follows: у 


nstant, 
hu constant. 


R by using a Percentage scoring scheme: 100 Y/R. Can 
(and perhaps better) scheme Which would take care of 
nces in R in the Sense of yielding a score Which is 


you suggest another 
individual differe 
dent of R? 


10.25. If 65 Per cent of the variance in Z is associated with Variation in Y and 
70 per cent of the Variance in Z js associated with variation in Y, what can be said 
about the correlation between Y and y? Why? 


10.26. If rj. = 80 and n, = 70, how might we ex 
р. 167 of the text that r, 


truly indepen- 


plain without recourse to 
10.27. If "уз = .60, ^13 = .60, and газ = .00, the еш Е 


: j 123 becomes .75; 
that is the correlation between variables 1 and 2 goes up when variable 3 is 


p чы 
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“partialed out" even though 3 is uncorrelated with 2. How would you explain 
this? 

10.28. We noted that the correlation coefficient is affected by heterogeneity 
with respect to one or both of the variables being correlated and with respect to a 
third variable, and we developed formulas which would correct for heterogeneity. 
Now suppose we have an rz, based on 100 boys and 100 girls (№ = 200). 

a. Then by making separate sex distributions for X we find a marked 
sex difference; how might our rzy be influenced by this sex difference? 
Why? 

b. Next it is discovered that there is also a sizable sex difference on Y. 
Considering now that both variables show sex differences, what can 
you say about the effect of such differences on rsy? Again, why? 

c. Can you propose (in rough outline) a scheme for getting rid of the 


sex effect on rzy? 
CHAPTER 11 


11.1. How would you show that the choosing of multiple regression coefficients 
so as to minimize the error of prediction tends to maximize multiple r? 
11.2. Ifa criterion measure has a reliability of only .60, what is the limiting 
percentage of the criterion's variance that is predictable by any single predictor or 
any combination of predictors? 
11.3. Suppose in determining the beta coefficients for a 3-variable multiple 
correlation problem your calculations led to В, = .70 and f, = .80. 

a. Why might we suspect error in your computations ? 

b. But under what conditions might your values be correct? 
11.4. Consider the hypothetical multiple correlation situation involving a 
dependent variable, Ху, and two independent variables, X; and Хз, each of which 
correlates .707 with the dependent variable. If rə = 0, and then a fourth 
variable is found which also correlates .707 with the dependent variable, what 
can you say regarding Рд and ra? will both be zero or not? Why? (Negative 
hint: do not waste time substituting in formulas.) 
11.5. Suppose a 26-variable multiple regression problem with each of the 25 
possible predictor variables correlating .20 with the criterion variable and 
intercorrelating zero among themselves. Can you specify the value of the 
multiple r? How did you get your answer? (Obviously, you are not expected 
to answer this by using the Doolittle solution, so look for a “trick” solution.) 
11.6. Ifeach of m independent and uncorrelated variables yields a correlation 
of .30 with a dependent or criterion variable, how many of them would you need 
in order to build up a multiple r of .45? Of .90? 
11.7. Given the following regression equation in raw score form: 


X^, = 13.2X, + 39.1 X, + 18.5 


What are the possible factors that might be responsible for 39.1 being treble 13.2? 
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11.8. We learned that the В weight for a Suppressant variable tends to be 
negative. Suppose a 3-variable (one dependent and two independent) multiple 


regression equation in which f, is negative (say, —.40). Does it follow that 
variable 3 is a suppressant? Why? 


11.9. When we come to the analysis of variance test of 
multiple r we will learn that a multiple r based on 
(or predictor) variables will have associated with 
degrees of freedom, Can 
for your answer? 


the significance of 
N cases and т independent 
it a specifiable number of 


11.10. Frequently the clinician will utilize a difference score Y — Y, as a 
basis for predicting. Examples: in Rorschach 


this might be M — C; in the 
Babcock scheme for measuring mental deterioration the average standing on 


€ scoreon a vocabulary test; on the Wechsler- 


can you make? (Note: ther 
involved here.) 


ят iduals into а dichotomy, the 
15 frequently 4 t i ; 

percentage agreement. In terms of the a ee wae А 

with frequencies А, B, C, and D, thi i Judge I vs. judge 


quivalent to dividing B + C 


d псу in eith 

or lower-right hand cell. Under what circumstance ШЕ upper-left 
happen? (Your answer should be in terms of obs 
tion would you use 
between sex (as near a Point variable as one enco 
passing or failing a test item? Why? 

12.5. We discussed eta, the correlation ratio. What do уо 


i u su Ose “Ыса 
eta" stands for? Write out a reasonable guess as to the Torti A a 
measure. г such а 


12.6. Occasionally we find instances in which the Correlation 
percentile scores on two tests has been computed. Such а corre] 


resembles one type of correlation discussed somewhere in th 
In what way? 


| between the 
ation coefficient 
© text. Which? 
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saying that the two are uncorre- 


12.7. Consider two variables, X and Y. Does 
Y and X are independent? 


lated (product-moment sense) necessarily mean that 
Explain. 

12.8. Sometimes a skewed distribution of scores can be approximately norma- 
lized by a simple transformation, such as log X or the square root of X. Now 
suppose rz, = .60 and that linearity of regression holds exactly. If the square 
root transformation is used on X, would you expect the correlation of these new 
X values and Y to be higher or lower than .602 Why? What change would you 
expect if for both the original and transformed Xs we used the rank difference 
method in determining the correlation between the two variables? 

12.9. Suppose we have a plot of the relationship between two variables and that 
the form of the relationship seems to be logarithmic. Accordingly, we write 
an equation for predicting Y from X as Y' = Blog X +A with B and A so 
determined as to minimize X( Y — ү). Obviously we can ascertain the variance 
about the curved line by computing x(Y — Y’P/N, the square root of which 
would be analogous to the standard error ofestimate. Next we define an index of 


relationship, say theta, as 

i x(Y — YIN 

theta 251—729 T 
e 5°, 


We also compute r and eta. Problem: arrange theta, eta, and r as to magnitude. 


Basis? 

12.10. Supposea scattergram between X and Y (both measured in a graduated 

fashion, and both yielding symmetrical distributions) and that we compute r, 
point r (the last two by dichotomizing near 


eta, point biserial r, and the fourfold 
the median or medians). Arrange these measures in order for expected magni- 


tude. 
12.11. One form of the formula for biserial r contains М» — М}, the other 


contains М, — My. Now suppose the number of cases in category 1 is so small 
that Mj is very unstable in the sampling sense, hence the computed correlation 
must also be unstable. Why does not the use of the second form, which avoids 
the unstable mean, lead 


12.12. Suppose а discri 


to a more stable r? 

minant-function weighting of six tests for differentiating 
delinquents from nondelinquents; and suppose that for those having a weighted 
(total) score between, say, 160 and 170 it is noted that 90 are nondelinquents 
and 10 are delinquents- What can you say about the predictive value of scores 


between 160 and 170? Any cautions? 
CHAPTER 13 
13.1. Under what circumstance is the chief assumption underlying the chi 
square technique violated ? | 
13.2. In their 1949 Psychol. Bull. article on “The use and m— 


square test," Lewis and Burke cite nine different types of errors frequently 


ing chi square. One of these is "neglect of frequencies of non- 


made in us! 5 Ў 
occurrence." Can you illustrate what is meant by this? 
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13.3. Specify briefly the types of situations for which it is easy to substitute the 
Simple binomial for chi Square as a test of significance. 


quare, Pearson and his followers claimed that 
the df for a fourfold table was 2. R. A. Fish i У 
ensuing argument by pointing out that the df had to be 1 in order for a chi square 
P to agree with another (well-established) significance test applied to the same 
fourfold table. Query: What other test? Be specific. 


ly in a contingency table a cell may have zero frequency. 


13.7. Occasiona à 
Should this lead to any changes in using the chi Square technique for testing 
the hypothesis of independence? If so, what? If not so, why not? 


Table VII 
Item 2 
F P 
P 0 10 10 
Item 1 
F 14 20 
14 16 30 


13.8. Given pass (P) and fail (F) information for two test items in Table VII. 
Consider testing the two hypotheses: 


Н, for independence and Ha for difference 
in difficulty. For which would it be more important to use an exact probability 
test? Why? Be Specific. 


13.9. Specify (by example) a situation for which the df for chi square is k — 1 
for k observed f; 


Tequencies and then a situation for which the df is also k — 1 
but there are 2k observed frequencies, 


13.11. А study is made to learn whether the items on the 1937 Stanford-Binet 
have the same difficulty values as during the standardization tes 
1931-33. An item on which 5-, 6-, 7-, and 8-year-olds are tested 
the difficulties (percentages for passing) given in Table VIII, 


ting period, 
might yield 
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Table VIII 
Age 5 6 7 8 
N 101 203 202 203 
1931-33 
pA 26 4l 58 71 
N 90 120 130 115 
1955 
% 20 37 49 68 


Needed: An over-all test for the significance of the differences between the two 
periods of testing. How would you do it? Indicate just how you would set up 
your work sheet for making the calculations. 

13.12. Suppose an experiment involving two groups of rats, one of which 
(Group A) has been on a standard diet, the other (Group B) has been on a diet 
which is deficient in Vitamin A. Let №; = 30 and Ng = 20. In order to see 
whether the vitamin deficient group will choose a food with high natural vitamin 
content (say food X) in preference to food Y of low content, the members of each 
group are given a chance to choose on each of 5 successive days. In Group A 
there are 90 preferences for food X and in Group B there are 70 preferences for 
food X. Now our investigator has heard of the chi square test. He reasons 
that by chance 50 per cent of the choices in each group would be for food X, 
hence for Group A 75 of the 150 choices would be for X and in Group B 50 of the 
100 choices would be for X. Thus he arrives at the following table: 


A B 
Chance for X 75 50 
Observed for X 90 70 


from which he calculates chi square as (90 — 75)?/75 + (70 — 50/50 = 11. He 
next worries about the number of degrees of freedom, and seeing no restrictions 
decides that df = 2. Hence he finds P to be between the .01 and .001 levels of 
In what ways, if any, can the statistical treatment be criticized ? 


significance. 
13.13. Given the fourfold tables in Table IX for passing (P) and failing (F) test 
items: 
Table IX 
Item 2 Item 4 
F Р Е Р 
Р 20 40 Р 0 10 
Item 1 Item 3 


30 10 F 10 7 
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a. Specify two meaningful null hypotheses that are testable, via the chi 
square technique, with the 1-2 item table (formulas not called for). 
b. Why would it be unsafe to test, via chi square, similar hypotheses 
regarding items 3 and 4? (N small is not a very sophisticated answer.) 


c. Indicate alternative (and better) methods for handling the two hypo- 
theses re the data for items 3 and 4. 


13.14. Most of the items on the Strong Interest Inventory involve L (like), 
I(indifferent) and D (dislike) response categories. 

a. Suppose we have the responses from two independent groups on item 
i and we wish to test the hypothesis that the two populations from 
Which our groups have been drawn are alike in their responses. 
Indicate the tabular Setup and the techniques you would use in 
testing this hypothesis. 

b. Suppose item i and item j with responses from a single group and we 
wish to test the hypothesis that the Tesponses to the two items are 
independent. Tabular Setup? Technique? 

е; How would you go about testing the hypothesis that the “L” responses 
to item i differed from the “L” Tesponses to item j for a single 
group? 

13.15. The following statement may be found in t 


“For ns larger than 30, the expression V2,2 
distribution which will follow very closely the u 
is accordingly .05 that this expression will e 
exceed +2.33, by chance". 

a. To what does the z refer? 

b. Why the concern with plus values only? 

c. How do you reconcile the use of 1.64 instead of 1.96 when you 

consider that 1.96 is for a two-tailed test, whereas chi Square, under 


the appropriate condition, must equal 3.84 (or 1 96 fora 
two-tailed significance level of .05? Mii ati O 


13.16. In the earlier days chi square was typically regarded as 4 technique for 
testing the “goodness of fit" of curves. What kind of curves? Why ав, other 
kinds? ; 


13.17. In general, would you expect chi square fora 2 x 2t 
2 able to be larger ог 
smaller than for a 3 x 3 table? Why? Under what circ B 

u 
reverse be true? mstance could the 


he text chapter on chi square. 


= V2n — 1 will have a sampling 
nit normal curve, The probability 
xceed +1.64, and .01 that it will 


13.18. Recently the author read a manuscri 
group and three groups under three experi 
groups with individuals assigned at random to the Вгоирѕ. The response data 
were dichotomous, hence chi square was used, A total of three Jones и 
(one for each experimental condition against control) were жр о ges 
three chi squares were calculated, each with 1 df. Then the three chi rom whi : 
the dfs) were added for a total chi square based on 3 4. Айу Mie (an ; 
no low expected values involved.) Be specific. mment? (Note: 


Pt which involved data on a control 
mental conditions, a total of four 
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CHAPTER 14 


14.1. Suppose that in reading an older (circa 1920) study you find an Mp and 
Spp (S of distribution of difference scores) based on N — 10. You wish to 
re-evaluate Mp via the t technique. If no actual scores are reported, how would 
you proceed to get the needed unbiased estimate of the variance of the difference 
scores? 

14.2. In what way is the concept of "degrees of freedom" similar for chi square 
applied to frequencies and for the variance estimation situation? 

14.3. Typically, one tail (the right-hand side) of the chi square distribution is 
involved in tests of significance. 

a. Under what specific condition does this one tail provide a two-sided, 
or two-tailed, test? 

b. State two entirely different types of situations one of which requires 
using both tails of the chi square distribution and the other of which 
requires one tail and an alertness for the other. 

14.4. How could you use F to determine whether a variance of 25, based on 
a sample of N cases, deviates significantly from a hypothetical value of 16? 
14.5. Ina certain textbook on statistical method you find the following data 


for N — 30 cases for two forms of a test having .93 as its form versus form 
reliability: 2 
i Mean se 


FormA 44.4 193.2 
FormB 42.8 146.3 


To judge whether the two forms differ in variability, the author takes 193.2/146.3 
as F. Any comment? 

14.6. The distributions of scores on successive learning trials on a pursuit rotor 
typically increase in variance from trial to trial. Why is Bartlett’s test not applic- 
able for testing the differences among such variances? 

14.7. Given for the 1937 Stanford-Binet the following standard deviations (all 
testing done during March 1956, and no siblings): 


Age 3 Age 4 
Boys Girls Boys Girls 
Ns 50 48 49 51 
Form L 17.5 18.0 16.8 16.5 
Form M 16.2 17.2 16.3 17.6 


a. For the best test of the differences between Ss, the difference 
17.5 — 16.2 would be tested by which formula? 
18.0 — 16.5 would be tested by which formula? 
17.6 — 16.3 would be tested by which formula? 
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а. Since .84 and .75 are both $2 
them into s? values? 


b. Presuming that you have the 


15.1. The tabled values of F i 


15.2. When we use F i 


е in 
» Say, the means for four ex 
(or requirement) of independence arises in at least three different places. Can 
you specify ? 


is a two-tailed test despite the ed value for the -01 level is actually 
the .02 level when a two-tailed test is appropriate for testing hypotheses regarding 
difference in variability of two groups. 


š et test.” In what way might 
this statement be regarded as partially true i 
to make it exactly true? 


15.5. With reference to the variance estimates j 
is it not permissible to take F = st 
the residual variance is greater than the within arr: 


15.7. We had one F test in which the numerator involved eta squared minus ғ, 
and another F test involving the difference between two multiple rs Squared, In 
what way are the two Fs similar (or analogous) ? 

15.8. We learned of an F test for "1.33 and for r. Do you see an easy way to test 
the significance of the point biserial + by an F test? How? 

15.9. We have seen how the goodness of fit of a normal curve t 


: 1 о frequencies 
can be checked (or tested) via the use of the chi square technique 


» and we have 
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seen how the F test can be used to test the goodness of fit of linear regression. 
Suppose we have height measurements on 100 children at each age, 1 to 18 
inclusive (cross-sectional, i.e., we are not following the same children from 1 to 
18). All measurements are within a week of a birthday. It is sometimes argued 
that growth follows the Gompertz curve, the equation of which is a double 


exponential function: 
Y = ug 


in which v, g, and h are constants determinable from the data, X is age, and Y is 
height (of the children). 

a. Set up a variance table with appropriate symbols to indicate the 
breakdown of the sum of squares (for Y), the degrees of freedom 
(actual numerical values), and the variance estimates (in symbols) 
which you would use to test the goodness of fit of the data to the 
Gompertz curve. Specify symbolically the F ratio you would use. 
b. In fitting a normal curve to a frequency distribution, we set up ex- 
pected frequencies. What in the fitting of a Gompertz curve would you 

consider as analogous to "expected" frequency ? 
c. Would your proposed scheme for testing the fit of the Gompertz curve 
be valid in case of a longitudinal study (i.e., we follow same children 


from ages 1 to 18)? Defend your answer briefly. 


CHAPTER 16 


16.1. If you mistakenly used simple analysis of variance for testing the difference 
between G means based on N sets of matched individuals, would you expect a 
smaller or larger P than would have resulted had you used the more appropriate 
two-way analysis of variance? Why? 

16.2. Consider the ordinary z test for the difference, via Mp, between correlated 
means. What in this 2 test corresponds most closely to “interaction”? Be more 
с than a mere statement that “it” is found in the error term or in the 


specifi 5 : ч 
enominator or in the correlation between scores. 


numerator or in the di 
163. Given information regarding sex difference at birth for large samples of 
American and of British babies. The data can obviously be placed in a two-way 
analysis of variance setup, which would permit testing for a nationality difference 
as well as for a sex difference. The sex by nationality interaction could also be 
tested by the analysis of variance method. Could this interaction have been 
tested prior to the invention of the F technique? How? 

16.4. During our discussion of chi square we did not mention "interaction." 


Suppose the following data for number of subjects who overcome “set” in the 


water-jar test: | А 
70 of 100 male science majors 


60 of 100 female science majors 
40 of 100 male history majors 
50 of 100 female history majors 
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Can you Specify an interaction, and how would you proceed to test it (the 


interaction) for statistical significance? (Note: this is not a small sample 
situation.) 


Scores per cell)? Why? 


16.7. In order to ascertain the effect of varying the color of the stimulus patch 
on critical flicker fusion you might plan to use 10 subjects, each measured 6 times 
under each of four color conditions: red, blue, yellow, and green (brightness 
controlled). Set up a schematic variance table, with sources, variance estimates, 
and numerical dfs. Indicate legitimate Fs (actual variance ratios, in symbols) 
that can be used to test hypotheses, and Specify the Benerality of conciusions you 
would draw from Possible significant Fs, 
16.8. Suppose you are consulted by a Statistically naive individual who has 
the notion that he can plan a study having to do with the effects of height and 
weight on basal metabolism by using a factorial design with height and weight 
as the basis for a two-way classification, аа 

а. Апу warnings to him regarding possible difficulties in such a design? 

b. What plan would you suggest instead ? 
16.9. Suppose researchers A and B both start with 12 litters of rats, 4 rats per 
litter. Both use identical T mazes. Researcher A splits each litter randomly so 
as to have four Broups which are run under four different degrees of food depriva- 
tion. To test the between groups differences (deprivation effects), A computes an 
F with an interaction variance as "error." Researcher B runs all his rats under 
Опе condition, then calculates an F as the between litters variance estimate 
over the within litters variance estimate. (Each rat in both experiments has just 
One score.) 

a. Specify the degrees of freedom for A’s Fand for B's F. 

5. Why did A use an interaction variance estimate as “error” whereas B 


shown the movie, and a control group was not. Both groups were pretested with 
the questionnaire, and after the experimentals had seen the movie both groups 
were retested. The Ns were 50 and 90. 
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Table X. Summary statistics 


Experimental Control 
Mean, pretest 23.55, Sy = 2.87 26.54, Sy — 2.26 
Mean, posttest 1640, Sy, = 4.10 27.06, Sy = 2.79 
Test-retest r = .64 Test-retest r = .84 
Difference in means: 7.15 Difference in means: .52 
Spy = 3.16, z = 2.26 Spy = 1.63, z = 33 


It was concluded that because the experimentals show a significant (at P = .03 
level) difference, whereas the controls do not, the movie did lead to changes 
in attitude. Do you see anything wrong with his statistical treatment? 

16.11. Because a smaller F is needed for a prescribed level of significance as the 
df for the denominator becomes larger, it has been argued frequently that we 


should strive to increase this df. 
a. Consider Design A with, say, 20 cases in each of two experimental 


groups having been assigned randomly and independently to the 
two groups in contrast to Design B in which we also have 20 cases 
per group, but the groups have been matched on a thought-to-be 
relevant variable by setting up 20 pairs of individuals. Which design 
provides the greater df and why might it be unwise to use the design 
with the larger df? 

b. Consider the test for the significance of linear correlation. As we 
outlined the procedure, the df for the denominator variance (residual) 
was N — 2. Now in some texts we can find that the denominator 
variance is taken as the within array variance about the array mean 
with df = N — G where G is the number of arrays. Obviously, the 
dfs differ according to which variance is being used as “error.” Why 
might the test based on N — 2 df be no more apt to lead to significant 
Fs than the one based on N — G df? 

c. Part b involves a within array variance about the array means anda 
within array variance about a regression line. Why can’t we test the 
difference between these two variances by taking F as their ratio? 
Do you see a possible indirect method for making an inference 
regarding their difference? 

16.12. Consider a two-way layout for the scores of 30 persons on C = 3 forms 
of a test. 

a. The remainder term will provide an estimate of error of measurement 
variance with how many (numerical value, please) degrees of freedom? 

b. If your statistical clerk regarded the data as simply 30 groups of 
scores with т = 3 per group, he (or she) would have a possibly 
different estimate of error of measurement variance in the within 
persons term, which would have (numerical) how many degrees of 
freedom? 
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c. The “possibly different" above implies what possible major source 
of difference, aside from dfs, in these two ways of estimating error 
variance? 

d. How would you test the difference between form means? 

e. How would you test the significance of the difference between the 
three form standard deviations? (Note: the available method may 
not be entirely satisfactory for a reason which you might specify.) 

16.13. Lay out a series of possible plans for studying the effect of illumination 
and the effect of foveal versus peripheral vision on critical flicker fusion (CFF). 
Let us agree to use five levels of illumination and four areas of retinal stimulation 
(foveal plus three areas proceedin outward from the fovea toward the periphery). 
For each approach, indicate the sources of variation, the dfs, the variance 
estimates, and appropriate Fs. Evaluate the relative merits of your several plans. 


(Note: CFF is not affected by practice and can be measured in a couple of 
minutes.) 


16.14. In an issue of the Journal of Consulting Psychology will be found a 
study of two groups (paranoid Schizophrenics and normals), 27 cases per group. 
We are told that “the groups were equated for age, education, and intelligence." 
The dependent variable is a measure of “distortion” of responses to four stories. 
The authors present an analysis of variance table, given here as Table XI. 


Table XI. Analysis of variance 


Source df SSqs Var. Est. F P 
Groups 1 13.74 13.74 19.35 «.01 
Stories 3 17.81 5.94 8.37 «.01 
Individuals 53 144.87 2.73 3.85 <.01 
G by S interaction 3 4.56 1.52 2.14 2.05 
Error 144 101.73 wl 

Total 204 282.71 


What, if anything, is wrong with their statistical analysis? 
16.15. In practicall 
determines the variance estima 


than the variance chosen as "error" for the denominator. Occasionally the 


numerator variance is so small relative to the chosen “error” variance that F 
taken upside down is Significant. In discussing this, another text says “The 
situations where the F obtained in this manner is significant Probably have no 
reasonable interpretation other than that they are the occasionally Significant 
values which are to be expected in random sampling.” Any comment? 

16.16. A significant upside down F has its counterpart in what kind of a chi 
square? 
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16.17. Do you think it possible, in a three-way fixed effects design; to have 
(a) three-way interaction without two-way interaction and (b) vice versa? 
Explain. (The use of diagrams may help here.) 
16.18. It has been suggested that when an interaction is significant (fixed 
constants), we should proceed to a series of analyses of variance one lower in 
Order. That is, a significant R x B x C interaction should be followed by, 
say, B two-way analyses involving rows and columns; anda significant R x С 
interaction should lead to, say R one-way analyses. What possible sense can you 
make of this suggestion? 
16.19. The author of a recent letter criticizes this textbook for advocating, 
without qualifications, the use of, e.g. 5° for testing column effects in an 
[а,4,4.] design. He says that 52,, should be used only when it is larger (not 
necessarily significantly larger) than s?» Stated differently he says that if s*,, is 
smaller than s?,,, the latter should be used as “error.” Obviously, his worry is 
about those situations for which 5*,, and s®,,, do not differ significantly. He 
mentions “pooling” after making the point that “one could use either term," i.e., 
either 5°,, or s?,,, as “error.” Aside from “pooling,” and in rather simple 
commonsense (but not statistically naive) terms, 
a. what argument do you think he set forth in favor of using s*,, only 
when it is larger (though not significantly so) than s?,, as "error" 
for F? 
b. What argument would you use against his proposal? Note that this 

part can be answered even though you cannot answer part a. 
16.20. Do you see any possible way of utilizing the analysis of variance tech- 
nique for testing the hypothesis that the correlation between two variables is 
perfect within limits imposed by errors of measurement? This is the correction 
for attenuation problem under another guise. We intend this to be a hard 
question, and the only hint is to think in terms of standard scores for the two 
variables being correlated. 

CHAPTER 17 


linear and quadratic trend analysis has been used in 
> that is actually qualitative but for 
the expected outcome for the 


17.1. Occasionally, 
situations involving an independent “variable 
which the investigator argues that in terms of 
dependent variable the “levels” on the independent variable can be ordered. 
Accordingly the qualitatively different levels are treated as though on a scale with 
equal spacings (distances) from level to level as a basis for trend analysis. Do you 
see any difficulty in this procedure? 

47.2. In a recently published textbook is found an example in which five 
subjects are measured on three successive trials. The breakdown of the total sum 
of squares quite properly leads to sums of squares for trials (df = 2), for subjects 
(df = 4), and for subject by trial interaction (df = 8). The sum of squares for 
trials is 90.00, and the sum of squares for. linear trend is also 90.00 (df = 1). 
a. What implication does this equality of the two sums of squares have 

for a possible quadratic component? 
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b. The author failed to follow through with any implication regarding 
the consequence of having unequal dfs (2 and 1) for the trial and the 
linear component sums of Squares (of the same amount). What 
helpful remark could he have made? 

17.3. Consider the situation for which we have just three levels and linear and 
quadratic components have been "taken out." 
the results, about the fit of à quadratic equati 


ade prior to p. 354 of the text. How 
P involving a single linear trend based 


17.5. Suppose the ғ between Y and X is 32 and the correlation ratio (eta) for 
Y on X is .40, both computed for a sample of N — 100 with G — 14 intervals on 
the x axis. The F for testing eta will have 13 and 86 degrees of freedom, whereas 
the F for testing r will have 1 and 98 degrees of freedom. When the Fs are com- 
puted and the significance levels determined, we have Еа = 1.26, P about .30, 
and F, — 13.76, P about .001. How do you account for r, though smaller than 
eta, being the more significant? (Do not forget that the larger the т; in the F 
table, the greater the significance for Fs of the same size.) 

17.6. When we have the Case XV Setup, we с 
possible B linear trends by following the p 


Suppose we wish to test the differences between the quadratic components of the 


B trend lines. How would you do this? (Note: You might first take out the 
linear components or you might proceed directly to the quadratic part.) 


an test the differences among the 
rOcedure given on pp. 354-355. 


CHAPTER 18 
18.1. Line 5 of Table 18.2 indicates th 


€ possibility of computing an r based on 
between sums. Why would such an r 


be meaningless when С =2? 


variables based on combine 
be distorted by possible sex differences on either or both varia 
you proceed to obtain a Single r that is not distorted by the sex difference? 
18.4. We failed to mention the problem of variance homogeneit 
cussing the covariance adjustment technique. S 
assumed to be homogeneous? Your reasoning? 


18.5. An assumption underlying the covariance adjustment tec’ 
geneity of regression from group to group. Does the text provi 
testing this assumption? Where (or what test)? 


d sex groups will 
bles. How might 


| y when dis- 
pecifically, what variances are 


hnique is homo- 
de a method for 


CHAPTER 19 


19.1. Suppose you have 22 cases measured under normal (control) conditions, 
then measured under a prescribed experimental condition, You are interested 
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in evaluating the changes, but because of marked skewness (in which scores?) 
you are skeptical of the tenability of the г test. Do you see a way of testing the 
significance of the changes by a method for which you might use chi square as an 
approximation ? 

19.2. A possible extension of the general idea of the median test for two or more 
groups would be to classify the scores of each group into four categories accord- 
ing to their position relative to Оз, Qs, and О, based on the combined groups. 
This would lead to a 4 by k table when we have k groups, from which a chi 
square with 3k — 1) df could be computed. Aside from difficulties when samples 
are very small and the loss of efficiency caused by grouping, do you see any 
possible problem in connection with the meaning of a significant chi square from 
such a setup? 

19.3. While reviewing a manuscript submitted to Psychological Monographs, 
the author encountered a two-way fixed effects design in which rows stood for 
eight different “treatments” and columns stood for two groups (pilots and non- 
pilots), with 16 (independent) cases per cell. Apparently the writer of the 
manuscript was a devotee of nonparametric methods: instead of testing the T 
by G interaction by the conventional F test with the within cells as the error term, 
he used Kendall's tau. If, his argument goes, the tau for the eight sets of 
means is significantly negative he would concludethat theinteractionis significant. 
This tau is, of course, the correlation between a rank-ordering of the means in the 
first column and a rank-ordering of the means in the second column, with n = 8 
pairs of ranks. For n = 8, tau must reach (negative) .49 for significance at the 
chosen level. Now there are three distinctly different bugs in this, every one of 
which nullifies his procedure. OK, try your critical powers. 


CHAPTER 20 


20.1. When experimental and control groups are set up on the basis of in- 
dividuals paired on two control variables, the gain in precision (or the error 
reduction) depends upon what fact(s) presumably available before carrying out 
the experiment ? 
20.2. Suppose you wish to do an experiment in which the cost per experimental 
subject is far greater than that for a control. Accordingly you decide to take 
Мо as 4Ny. What scheme, other than randomization, would you use to assure 
comparability for the two groups? And how would you proceed to test for 
significance the difference between the means of the two groups? 
20.3. In a recent study of sex differences in problem solving (X), the fact of 
differences in general intelligence as measured by a college aptitude test (Y) was 
taken into account by comparing males and females who had been paired on Y. 
a. What statistical procedure do you think was used in testing the null 
hypothesis of no sex difference on X? 
b. Can you suggest an alternative. experimental-statistical plan for 
getting at sex difference on X with Y controlled? 
r the large sample situation, the sampling variance of a mean based on a 


204. Fo i er 
atified on variable U is given by S?g = S*, (1 — r*4,)]N and for the 


sample str: 
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sampling variance of the difference between means based on groups matched as 
to distribution on control variable Y (not individual pairing), we have 


$25 — S2, — ray)lN, rz S? «(1 =) 


Perhaps you will have noted the similarity of sampling variance for the stratified 
sampling situation and the matched distributions situation. Do you see a con- 
nection between the foregoing formulas and the analysis of variance technique? 
20.5. When attempting to evaluate the relative effect of two movies on attitudes 
(measured in a continuous fashion), we may form two groups by random assign- 
ment of individuals and then we may follow either (a) the procedure of a pretest, 
show movie, posttest, with the statistical analysis based on a comparison of the 
two mean changes or (b) the "after only" plan in which one movie is shown to 
each group after which both groups are tested and the difference between the 
resulting *'after" means is tested for significance. This second, or “after only,” 
method is frequently more feasible than the first Procedure. In general, which 
design would you expect to be more precise? Why? Can you specify a condition 


which might make the other design more precise? (Hint: presume that all four 
possible standard deviations are equal.) 


Appendix 
TABLES A TO G 
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Table A. Normal curve functions 


z or z|o Area: mtoz Area: q Smaller y or Ordinate 
.00 -00000 -50000 .3989 
.05 :01994 -48006 .3984 
.10 .03983 -46017 .3970 
15 .05962 .44038 .3945 
.20 .07926 .42074 :3910 
225 .09871 40129 .3867 
.30 .11791 .38209 .3814 
35 .13683 36317 .3752 
40 .15542 34458 3683 
45 .17364 32636 .3605 
.50 .19146 .30854 .3521 
55 .20884 -29116 .3429 
.60 .22575 .27425 13332 
.65 24215 -25758 .3230 
70 .25804 24196 .3123 
75 .27337 .22663 .3011 
.80 28814 21186 .2897 
85 .30234 .19766 .2780 
.90 .31594 .18406 .2661 
.95 .32894 .17106 .2541 

1.00 -34134 -15866 .2420 
1.05 .35314 .14686 .2299 
1.10 36433 -13567 .2179 
1.15 .37493 .12507 .2059 
1.20 .38493 .11507 .1942 
1.25 .39435 .10565 .1826 
1.30 40320 09680 1714 
1.35 41149 .08851 .1604 
1.40 41924 08076 1497 
1.45 -42647 .07353 .1394 
1.50 .43319 -06681 .1295 
1.55 .43943 .06057 .1200 
1.60 44520 .05480 1109 
1.65 45053 04947 1023 
1.70 :45543 -04457 


-0940 
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Table A. Normal curve functions (continued) 


z or zja Area: mtoz Area: q Smaller y or Ordinate 
1.75 .45994 .04006 .0863 
1.80 .46407 .03593 ‘0790 
1.85 46784 03216 .0721 
1.90 47128 .02872 .0656 
1.95 47441 .02559 .0596 
2.00 .47725 .02275 .0540 
2.05 47982 .02018 .0488 
2.10 .48214 .01786 .0440 
2.15 48422 .01578 .0396 
2.20 .48610 .01390 .0355 
2.25 .48778 .01222 .0317 
2.30 .48928 .01072 .0283 
2.35 .49061 .00939 .0252 
240 .49180 .00820 .0224 
2.45 .49286 .00714 .0198 
2.50 .49379 .00621 .0175 
2.55 .49461 .00539 .0154 
2.60 .49534 .00466 .0136 
2.65 .49598 .00402 .0119 
2.70 .49653 .00347 .0104 
2,15 .49702 .00298 .0091 
2.80 ‚49744 .00256 .0079 
2.85 .49781 .00219 .0069 
2.90 49813 .00187 .0060 
2:95 .49841 .00159 .0051 
3.00 .49865 .00135 .0044 
3.25 .49942 .00058 .0020 
3.50 .49977 .00023 .0009 
3.75 .49991 .00009 .0004 

.49997 .00003 .0001 


4.00 
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Table B. Transformation of r to z 


r z r z r z 
.01 .010 34 .354 -67 811 
.02 .020 35 .366 .68 .829 
.03 .030 36 3377 .69 .848 
.04 .040 37 389 70 .867 
.05 .050 .38 -400 a .887 
.06 060 39 412 2 .908 
07 070 40 424 73 .929 
08 080 41 436 74 .950 
09 090 42 448 75 -973 
10 100 43 460 76 .996 
11 110 44 472 27 1.020 
12 121 .45 485 78 1.045 
13 131 46 497 79 1.071 
14 141 47 510 80 1.099 
15 151 .48 523 81 1.127 
16 161 49 536 .82 1.157 
17 172 -50 -549 .83 1.188 
.18 -181 si -563 .84 1.221 
19 .192 52 577 85 1.256 
20 -203 -53 590 86 1,293 
21 .214 :54 -604 .87 1.333 
22 224 55 :618 .88 1.376 
23 .234 -56 .633 .89 1.422 
24 .245 57 .648 .90 1.472 
25 .256 58 -663 91 1.528 
.26 .266 .59 .678 .92 1.589 
27 277 .60 .693 93 1.658 
.28 288 -61 -709 94 1.738 
.29 .299 .62 725 .95 1.832 
.30 .309 .63 741 .96 1.946 
31 321 64 758 97 2.092 
32 .332 .65 715 .98 2.298 
33 343 :66 793 


:99 2.647 
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Table C. Transformation of z to r* 


z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09 


0 .0000 .0100 .0200 .0300 .0400 .0500 .0599 .0699 .0798 .0898 
wl 0997 .1096 .1194 .1293 .1391 .1489 .1586 .1684 .1781 .1877 
2 .1974 .2070 .2165 .2260 .2355 .2449 .2543 .2636 .2729 .2821 
3 .2913 .3004 .3095 .3185 .3275 .3364 .3452 .3540 .3627 .3714 
4 .3800 .3885 .3969 .4053 .4136 .4219 .4301 .4382 .4462 .4542 


5 .4621 .4699 .4777 .4854 .4930 .5005 .5080 .5154 .5227 .5299 
6 5370 .5441 .5511 .5580 .5649 .5717 .5784 .5850 .5915 .5980 
Ni .6044 .6107 .6169 .6231 .6291 .6351 .6411 .6469 .6527 .6584 
8 .6640 .6696 .6751 .6805 .6858 .6911 .6963 .7014 .7064 .7114 
9 ‚7163 .7211 .7259 .7306 .7352 .7398 .7443 .7487 .7531 .7574 


1.0 ‚7616 .7658 .7699 .7739 .7779 .7818 .7857 .7895 .7932 .7969 
1.1 (8005 .8041 .8076 .8110 .8144 .8178 .8210 .8243 .8275 .8306 
1.2 .8337 .8367 .8397 .8426 .8455 .8483 .8511 .8538 .8565 .8591 
1.3 .8617 .8643 .8668 .8692 .8717 .8741 .8764 .8787 .8810 .8832 
1.4 .8854 .8875 .8896 .8917 .8937 .8957 .8977 .8996 .9015 .9033 


1.5 .9051 .9069 .9087 .9104 .9121 .9138 .9154 .9170 .9186 .9201 
1.6 .9217 .9232 .9246 .9261 .9275 .9289 .9302 .9316 .9329 .9341 
137 ‚9354 .9366 .9379 .9391 .9402 9414 .9425 .9436 .9447 .9458 
1.8 .9468 .9478 .9488 .9498 .9508 :9518 .9527 .9536 .9545 .9554 
1.9 .9562 .9571 .9579 .9587 .9595 .9603 .9611 .9618 .9626 .9633 


2.0 .9640 .9647 .9654 .9661 .9668 .9674 .9680 .9686 .9693 .9699 
2.1 9704 .9710 .9716 .9722 .9727 9732 .9738 .9743 .9748 .9753 
2.2 97517 .9762 .9767 .9771 .9776 .9780 .9785 .9789 .9793 .9797 
2.3 .9801 .9805 .9809 .9812 .9816 .9820 .9823 .9827 .9830 .9834 
24 .9837 .9840 .9843 .9846 .9849 .9852 .9855 .9858 .9861 .9864 


2.5 .9866 .9869 .9871 .9874 .9876 .9879 .9881 .9884 .9886 .9888 
2.6 .9890 .9892 .9894 .9897 .9899 .9901 .9903 .9904 .9906 .9908 
2.7 9910 .9912 .9914 .9915 .9917 .9919 .9920 .9922 .9923 .9925 
2.8 .9926 .9928 .9929 .9931 .9932 .9933 .9935 .9936 .9937 .9938 
2:9. .9940 .9941 .9942 .9943 .9944 .9945 .9946 .9948 .9948 .9950 


Fisher and Yates: Statistical tables for bio- 
iver and Boyd, Ltd., Edinburgh, by permis- 


* Table C is abridged from Table VII of 
logical, agricultural and medical research, Ol 
sion of the authors and publishers. 
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Table D. Distribution of x2* 


n P-.99 .98 .95 .90 .80 70 .50 
1 :00016 .00063 .0039 .016 .064 15 46 
2 .02 .04 .10 21 45 71 1,39 
3 12: 18 35 .58 1.00 1.42 2.37 
4 30 43 E 1.06 1.65 2.20 3.36 
5 55 75 1.14 1.61 2.34 3.00 4.35 
6  .87 1.13 1.64 2.20 3.07 3.83 5,35 
7 124 1.56 2.17 2.83 3.82 4.67 6.35 
8 1.65 2.03 2.73 3.49 4.59 5.53 7.34 
9 209 2.53 3.32 4.17 5.38 6.39 8.34 

10 2.56 3.06 3.94 4.86 6.18 7:27 9.34 

11 3.05 3.61 4.58 5.58 6.99 8.15 10.34 

12 3.57 4.18 5.23 6.30 7.81 9.00 11.34 

13 411 4.76 5.89 7.04 8.63 9.93 12.34 

14 4.66 5.37 6.57 7.79 9.47 10.82 13.34 

15 5.23 5.98 7.26 8.55 10.31 11.72 14.34 

16 581 6.61 7.96 9.31 11.15 12.62 15.34 

17 641 7.26 8.67 10.08 12.00 13.53 16.34 

18 702 7.91 9.39 10.86 12.86 14.44 17.34 

19 7.63 8.57 10.12 11.65 13.72 15.35 18.34 

20 8.26 9.24 10.85 12.44 14.58 16.27 19.34 

21 8.90 9:92 11.59 13.24 15.44 17.18 20.34 

22 9.54 10.60 12.34 14.04 16.31 18.10 21.34 

23 10.20 11.29 13.09 14.85 17.19 19.00 22.34 

24 10.86 11.99 13.85 15.66 18.06 19.94 23.34 

25 11.52 12.70 14.61 16.47 18.94 20.87 24.34 

26 12.20 13.41 15.38 17.29 19.82 21.79 25.34 

27 12.88 14.12 16.15 18.11 20.70 22.72 26.34 

28 13.56 14.85 16.93 18.94 21.59 23.65 27.34 

29 1426 15.57 17.71 19.77 22.48 24.58 28.34 

30 14.95 16.31 18.49 20.60 23.36 25.51 2934 


* Table D is abridged from Table IV of Fisher and Yates: Statistical ta 
logical, agricultural and medical research, Oliver and Boyd, Ltd., Edinburgh, 
sion of the authors and publishers. 
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Table D. Distribution of x°* (continued) 


n .30 .20 .10 .05 02 .01 .001 
1 1.07 1.64 2.71 3.84 5.41 6.64 10.83 
2 2.41 3.22 4.60 5.99 7.82 9.21 13.82 
3 3.66 4.64 6.25 7.82 9.84 11.34 16.27 
4 4.88 5.99 7.78 9.49 11.67 13.28 18.46 
5 6.06 7.29 9.24 11.07 13.39 15.09 20.52 
6 7233 8.56 10.64 12.59 15.03 1681 2246 
T 8.38 9.80 12.02 14.07 16.62 1848 2432 
8 9.52 11.03 13.36 15.51 18.17 20.00 26.12 
9 10.66 12.24 14.68 16.92 19.68 21.67 27.88 
10 11.78 13.44 15.99 18.31 21.16 23.21 29.59 
11 12.90 14.63 17.28 19.68 22.62 2472 31.26 
12 14.01 15.81 18.55 21.03 24.05 26.22 32.91 
13 15.12 16.98 19.81 22.36 25.47 27.69 34.53 
14 16.22 18.15 21.06 23.68 26.87 29.14 36.12 
15 17.32 19.31 22.31 25.00 28.26 30.58 37.70 
16 18.42 20.46 23.54 26.30 29.63 32.00 39.25 
17 19.51 21.62 24.71 27.59 31.00 33.41 40.79 
18 20.60 22.76 25.99 28.87 32.35 34.80 42.31 
19 21.69 23.90 27.20 30.14 33.69 3619 43.82 
20 22.78 25.04 28.41 31.41 35.02 3157] 45.32 
21 23.86 26.17 29.62 32.67 36.34 38.93 46.80 
22 24.94 27.30 30.81 33.92 37.66 40.29 4827 
23 26.02 28.43 32.01 35.17 38.97 41.64 49.73 
24 2710 29.55 33.20 36.42 40.27 42.98 51.18 
25 28.17 30.68 34.38 37.65 41.57 4431 52.62 
26 29.25 31.80 35.56 38.88 42.86 45.64 54.05 
27 30.32 32.91 36.74 40.11 44.14 46.96 55.48 
28 31.89 34.03 37.92 41.34 45.42 48.28 56.89 
29 32.46 35.14 39.09 42.56 46.69 49.59 58.30 
30 33.53 36.25 40.26 43.71 47.96 50.89 59.70 


* Table D is abridged from Table IV of Fisher and Yates: Statistical tables for bio- 
logical, agricultural and medical research, Oliver and Boyd, Ltd., Edinburgh, by permis- 


sion of the authors and publishers. 
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Table E. Distribution of t* 


n Р = 1 .05 .02 .01 .001 
1 6.314 12.706 31.821 63.657 636.619 
2 2.920 4.303 6.965 9.925 31.598 
3 2.353 3.182 4.541 5.841 12.941 
4 2.132 2.776 3.747 4.604 8.610 
5 2.015 2.571 3.365 4.032 6.859 
6 1.943 2.447 3.143 3.707 5.959 
7 1.895 2.365 2.998 3.499 5.405 
8 1.860 2.306 2.896 3.355 5.041 
9 1.833 2.262 2.821 3.250 4.781 
10 1.812 2.228 2.764 3.169 4.587 
11 1.796 2.201 2.718 3.106 4.437 
12 1.782 2.179 2.681 3.055 4.318 
13 1.771 2.160 2.650 3.012 4.221 
14 1.761 2.145 2.624 2.977 4.140 
15 1.753 2.131 2.602 2.947 4.073 
16 1.746 2.120 2.583 2.921 4.015 
17 1.740 2.110 2.567 2.898 3.965 
18 1.734 2.101 2.552 2.878 3.922 
19 1.729 2.093 2.539 2.861 3.883 
20 1.725 2.086 2.528 2.845 3.850 
21 1.721 2.080 2.518 2.831 3.819 
22 1.717 2.074 2.508 2.819 3.792 
23 1.714 2.069 2.500 2.807 3.767 
24 1.711 2.064 2.492 2.797 3.745 
25 1.708 2.060 2.485 2.787 3.725 
26 1.706 2.056 2.479 2.779 3.707 
27 1.703 2.052 2.473 2.771 3.690 
28 1.701 2.048 2.467 2.763 3.674 
29 1.699 2.045 2.462 2.756 3.659 
30 1.697 2.042 2.457 2.750 3.646 
40 1.684 2.021 2.423 2.704 3.551 
60 1.671 2.000 2.390 2.660 3.460 
120 1.658 1.980 2.358 2.617 3.373 
p 1.645 1.960 2.326 2.576 3.291 


* Table E is abridged from Table III of Fisher and Yates: Statistical tables for bio- 
logical, agricultural and medical research, Oliver and Boyd, Ltd., Edinburgh, by permis- 
sion of the authors and publishers. 
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Table F. Table of F for .05 (roman), .01 (italic), and .001 (bold face) 
levels of significance* 


n 
т x 1 2 3 4 5 6 8 12 24 © 


161 200 216 225 230 234 239 244 249 254 
1 4052 4999 5403 5625 5724 5859 5981 6106 6234 6366 
405284 500000 540379 562500 576405 585937 598144 610667 623497 636619 


18.51 19.00 19.16 19.25 19.30 19.33 19.37 19.41 19.45 19.50 
2 98.49 99.01 99.17 99.25 99.30 99.33 99.36 99.42 99.46 99.50 
998.5 999.0 999.2 999.2 999.3 999.3 999.4 999.4 999.5 999.5 


1013 9.55 9.28 9.12 9.01 8.94 8.84 8.74 8.64 8.53 
3 34.12 30.81 2946 28.71 28.24 27.91 2749 27.05 26.60 26.12 
167.5 1485 141.1 1371 134.6 132.8 130.6 128.3 125.9 123.5 


771 6.94 6.59 6.39 626 616 6.04 5.91 5.77 5.63 
4 21.20 18.00 16.69 15.98 15.52 15.21 14.80 14.37 13.93 13.46 
7414 61.25 56.18 53.44 51.711 50.53 49.00 47.41 45.77 44.05 


661 5.79 5.41 5.19 505 4.95 4.82 4.68 4.53 4.36 
5 16.206 13.27 12.06 11.39 10.97 10.67 10.27 9.89 9.47 9.02 
47.04 36.61 33.20 31.09 29.75 28.84 27.64 26.42 25.14 23.78 


514 476 4.53 439 428 415 4.00 3.84 3.67 
6 13.74 10.92 9.78 915 875 847 810 772 7.31 688 
35.51 27.00 23.70 21.90 20.81 20.03 19,03 17.99 16.89 15.75 


5.59 4.74 435 4.12 з.97 3.87 3.73 3.57 3.41 3.23 
T 12.25 9.55 8.45 785 746 7.19 6.84 647 6.07 5.65 
29.22 21.69 18.77 17.19 16.2231 15.52 14.63 13.71 12.73 11.69 


344 3.28 3.12 2.93 
5.67 5.28 4.86 
9.34 


5.32 446 4.07 3.84 3.69 3.58 
8 1126 8.65 7.59 7.01 6.63 6.37 6.03 
25.42 18.49 15.83 14.39 13.49 12.86 12.04 11.19 10.30 


5.12 426 3.86 3.63 348 3.37 3.23 307 290 2.71 
9 10.56 8.00 6.99 642 606 5.80 547 5.11 473 4.31 
22.86 16.39 13.90 12.56 11.71 11.13 10.37 9.57 8.72 7.81 


496 410 3.71 3.48 3.33 3.22 з.07 2.91 2.74 2.54 
10 10.04 7.56 6.55 5.99 5.64 5.39 5.06 4.71 433 3.91 
21.04 14.91 12.55 11.28 10.48 9.92. 9.20 8.45 7.64 6.76 


4.84 3.98 3.59 3.36 3.20 3.09 2.95 2.79 2.61 2.40 
11 9.65 7220 6.22 567 5.32 5.07 4.4 440 4.02 3.60 
19.69 13.81 11.56 10.35 9.58 9.05 8.35 7.63 6.85 6.00 


4.15 3.88 3.49 3.26 3.11 3.00 285 2.69 2.50 2.30 
9.33 6.93 5.95 5.41 5.06 482 450 416 3.78 3.36 
12.97 10.80 9.63 8.89 8.38 771 700 625 5.42 


* Table F is reprinted, in rearranged form, from Table V of Fisher and Yates: Statistical 
tables for biological, agricultural and medical research, Oliver and Boyd, Ltd., Edinburgh, by 


permission of the authors and publishers. 
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Table F. Table of F for .05 (roman), .01 (italic), and .001 (bold face) 
levels of significance* (continued) 


n E 1 2 3 4 5 6 8 12 24 © 


13 907 670 574 520 486 4.62 430 3.96 3.59 3.16 


14 8.86 651 556 5.03 4.69 446 4.14 3.80 3.43 3.00 


15 868 6.36 542 489 456 4.32 4.00 3.67 3.29 2.87 


16 8.53 623 5.29 4.77 444 4.200 3.89 3.55 3.18 2.75 


4.85 4.06 

445 359 320 296 281 2370 255 2.38 219 1.96 

17 840 611 518 467 434 410 379 345 308 265 
15.72 10.66 8.73 7.68 702 656 5 96 5.32 4.63 3.85 

441 3.55 316 293 277 266 251 234 215 1,92 

18 828 601 509 458 425 401 371 337 300 257 
15.38 10.39 8.49 7.46 681 635 5 76 5.3 445 3.67 

438 3.52 313 290 274 263 248 2331 88 

19 818 593 501 450 417 394 3.68 3.30 2 D 
15.08 10.16 828 726 661 618 559 497 429 3.52 

435 349 310 287 271 260 245 228 208 1.84 

20 810 585 494 443 410 387 3.56 323 286 242 
1482 9.95 810 710 6.46 602 544 482 415 338 

432 347 3.07 284 268 257 2.42 1.81 

2.36 

3.26 

1.78 

231 

3.15 

1.76 

2.26 

3.05 

1.73 

2.21 

2.97 


* Table F is reprinted, iri rearranged form, from Table V of Fishe; pom 
tables for biological, agricultural and medical research, Oliver and Boyd, E. H ates: Statistical 
permission of the authors and publishers. » Ltd., Edinburgh, by 
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Table F. Table of F for .05 (roman), .01 (iralic), and .001 (bold face) 
levels of significance* (continued) 


m | 


1 1 2 3 4 5 6 8 12 24 © 


4.24 3.38 2.9 2.76 2.60 249 234 2.16 1.96 1.71 
25 7277 5.37 468 418 3.86 3.63 3.32 2.99 2:62. 217 
13.88 922 745 649 588 5.46 491 4.31 3.66 2.89 


422 3.37 2.98 2.74 2.59 247 2.32 2.15 1.95 1.69 
26 7.22 5.53 464 4.14 382 3.59 3.29 2.96 2.58 2.13 
13.74 9.12 7.36 641 5.80 5.38 4.83 4.24 3.59 2.82 


421 3.35 2.96 2.73 2.57 2.46 2.0 2.13 1.93 1.67 
27 7.68 549 460 4.11 3.78 3.56 3.26 293 255 2.10 
13.61 9.02 727 633 5.73 531 476 417 3.52 2.75 


420 3.34 2.95 2.71 2.56 2.44 2.29 2.12 1.91 1.65 
28 7.64 5.45 457 407 3.75 3.53 3.23 2.90 252 2.06 
13.50 8.93 7.19 625 5.66 5.24 469 411 3.46 2.70 
4.18 333 2.93 2.70 2.54 2.43 2.28 2.10 1.90 1.64 
29 7.60 542 4.54 4.04 3.73 3.50 3.20 287 249 2.03 


13.39 885 7.12 6.19 5.59 5.18 4.64 4.05 3.41 2.64 


417 332 292 2.6 2.53 242 227 2.09 189 1.62 
30 756 539 4.51 402 3.70 347 317 284 247 2.01 


13.29 8.77 7.05 6.12 5.53 5.12 458 4.00 3.36 2.59 
4.08 3.23 2.84 2.61 245 2.34 2.18 2.00 1.79 1.51 
40 731 548 431 383 351 3.29 2.99 266 229 1.80 
12.04 8.25 6.60 5.70 513 473 421 3.64 301 223 


4.00 315 2.76 2.52 237 2.25 2.10 
60 7.08 4.98 4.13 365 334 3.12 282 250 2.12 1.60 
3.87 


11.97 776 647 5.31 476 4.37 


3.92 3.07 2.68 245 229 2.17 2.02 1.83 1.61 1.25 
120 6.85 479 3.95 348 3.17 2.96 266 2.34 1.95 1.38 
11.38 731 5.79 4.95 442 4.04 3.55 3.02 2.40 1.5 


"I 237 201 ze Те es Ми 00 
i D» 332 302 280 251 218 179 100 
id  -" а 274 213 1.00 


10.83 6.91 542 4.62 410 374 3.27 


ble V of Fisher and Yates: Statistical 
Oliver and Boyd, Ltd., Edinburgh, by 


* Table F is reprinted, in rearranged form, from Та! 
tables for biological, agricultural and medical research, 
permission of the authors and publishers. 
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Table G. Squares and square roots 


1.00000 | 3. P 1.22474 | 3.87298 


d 1.22882 | 3.88587 
1.00595 | Я $ T] 3.89872 
1.01489 | 5; E: 1.23693 | 3.91152 


4 3.22490 E $ 1.24097 | 3.92428 
102350 3124037 3 К 1.24499 | 3.93700 
1.02956 | 3.25576 E AS 1.24900 | 3.94968 


1.03441 | 3.27109 . z 1.25300 | 3.96232 
1.03923 | 3.28634 s н 1.25698 | 3.97492 


EE :04405 | 3.30151 е 5 1.26095 | 3.98748 
[ў 5.51662 


3.33167 5 n 4.01248 

3 .62 1.27279 | 4.02492 
1.06301 B T al 1.27671 | 4.03733 
1.06771 | 3.37639 “ T 1.28062 | 4.04969 
1.07238 | 3.39116 4 722: 1.28452 | 4.06202 
1.07703 | 3.40588 9 5 1.28841 | 4.07431 


1.08167 | 3.42053 
1.08628 | 3.43511 
1.09087 | 3.44964 


5.46410 
1.10000 | 3.47851 


1.10454 | 3.49285 
1.10905 | 3.50714 


1.29228 | 4.08656 
1.29615 | 4.09878 
1.50000 | 4.11096 


1.30384 | 4.12311 


1.30767 | 4.13521 
1.51149 | 4.14729 
1.51529 | 4.15933 


1.11355 | 3.52136 4 E 1.31909 | 4.17155 
1.11803 | 3.53553 А el 1.32288 


aan 
CoQ 


© 


Ree югы CEE n 
х е 


ane 
кее 


4.18330 
1.12250 | 3.54965 d А 1.32665 | 4.19524 


1.12694 | 3.56371 a! Я 1.53041 | 4.20714 
1.15157 | 3.57771 T . 1.55417 | 4.21900 
3.59166 5 к 1.53791 | 4.23084 


1.14018 | 3.60555 


114455 20129 E > 1.34536 | 4.25441 
б 1 8 Я e 1.3490; » 
1.15326 3.64692 E ^ prn 


1.35277 | 4.27785 
1.15758 | 3.66060 d a 1.3 
1.16190 | 5.67423 и * 130607 КЕЛ 
1.16619 | 3168782 E х 1.36382 | 431277 
1.17047 | 3.70135 


д ў 1.367: 

1.17473 | 3.71484 à d 137718 13286 

1.17898 | 372827 К й 1.37477 | 4.34741 

.18322 | 3.74166 1.37840: | 4.55890 
T 

-18743 | 3. 


к d 1.38203 | 4.3703 
-19164 7 E - 1.38564 4.38178 
1.19583 м А 1.58924 | 439318 


1.20000 | 3.79475 К i 1.39284 | 4. 
1.20416 | 3.80789 R . 1.39642 AA 
1.20830 | 3.82099 R ai 1.40000 4.42719 
1.21244 | 3.83406 


T Й 1.40357 | 4. 
1.21655 | 3.84708 у 4 1.40712 ы? 
1.22066 | 3.86005 К ae L141067 | 4.46094 
1.22474 | 3.87298 
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Table G. Squares and square roots (continued) 


VN 


1.41421 


4.47214 


VN 
1.58114 


5.00000 


BEN PEN ы 
SRE @ 


vo 


1.41774 
1.42127 
1.42478 


1.42829 
1.45178 
1.43527 


1.43875 
1.44222 
1.44568 


4.48330 
4.49444 
4.50555 


4.51664 
4.52769 
4.53872 


4.54973 
4.56070 
4.57165 


BRN Kun ююю ie 
Now ёлёлёл | ё 
Hae ана @ 


ъё 
ооч 


1.58430 
1.58745 
1.59060 


1.59574 
1.59687 
1.60000 


1.60512 
1.60624 
1.60935 


5.00999 
5.01996 
5.02991 


5.03984 
5.04975 
5.05964 


5.06952 
5.07937 
5.08920 


e 


1.44914 


4.58258 


2 
e 


1.61245 


5.09902 


-|E|9e99 


Rey NNN (rere Юю 
Bee Cee] i 


WON ANB аю 


1.45258 
1.45602 
1.45945 


1.46287 
1.46629 
1.46969 


1.47309 
1.47648 
1.47986 


4.59547 
4.60455 
4.61519 


4.62601 
4.63681 
4.64758 


4.65835 
4.66905 
4.67974 


"mmm 
gaa 
ga 


ee 
AnD 
оо 


a 
ооч 


1.61555 
1.61864 
1.62173 


1.62481 
1.62788 
1.63095 


1.63401 
1.65707 
1.64012 


5.10882 
5.11859 
5.12835 


5.13809 
5.14782 
5.15752 


5.16720 
5.17687 
5.18652 


bj HHH 
© 


2. 
2. 
2 
2. 
2. 
2. 
2. 
2. 
2. 
2. 


bit iii Rii 
SBS o2 QNS 


1.48324 


4.69042 


e 


1.64317 


5.19615 


1.48661 
1.48997 
1.49332 


1.49666 
1.50000 
1.50333 


1.50665 
1.50997 
1.51327 


4.70106 
4.71169 
4.72229 


4.75286 
4.74542 
4.75595 


4.76445 
4.77495 
4.78559 


NNN NIIS. MISES | DP] юр 
SM эмм м ane 
CaN Aan av=]3s| SB 


ERI 


1.64621 
1.64924 
1.65227 


1.65529 
1.65831 
1.66152 


1.66435 
1.66733 
1.67033 


5.20577 
5.21536 
5.22494 


5.25450 
5.24404 
5.25357 


5.26308 
5.27257 
5.28205 


1.51658 


4.79585 


© 


1.67332 


5.29150 


RN ююю Nieto | bd 
ы ыыы DIG 
Sau ase ани 


Se DOR e 
$*5 бок бее рады 


Per wiw wej 


ю 
КЧ 
© 


1.51987 
1.52315 
1.52643 


1.52971 
1.53297 
1.53623 


1.53948 
1.54272 
1.54596 


4.80625 
4.81664 
4.82701 


4.83735 
4.84768 
4.85798 


4.86826 
4.87852 
4.88876 


New] S 


B NNN 
о www pwo 
$963 oos anels 


1.67631 
1.67929 
1.68226 


1.68523 
1.68819 
1.69115 


1.69411 
1.69706 
1.70000 


5.30094 
5.31037 
5.31977 


5.32917 
5.33854 
5.34790 


5.35724 
5.36656 
5.37587 


1.54919 


4.89898 


©] ow 
© 


1.70294 


5.38516 


1.55242 
1.55563 
1.55885 


1.56205 
1.56525 
1.56844 


1.57162 
1.57480 
1.57797 


4.90918 
4.91955 
4.92950 


4.93964 
4.94975 
4.95984 


4.96991 
4.97996 
4.98999 


рю NNN NNN] юю 
оо оо tok toG 


1.70587 
1.70880 
1.71172 


1.71464 
1.71756 
1.72047 


1.72337 
1.72627 
1.72916 


5.39444 
5.403570 
5.41295 


5.42218 
5.43139 
5.44059 


5.44977 
5.45894 
5.46809 


VN 


1.58114 


5.00000 
M/10N 


eS) 
oji 
s/o 


(2.75205 | 


YN 


5.47723 


435 
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Table G. Squares and Square roots (continued) 


VN 
1.87083 | 5.91608 


1.87350 | 5.92453 
1.87617 | 5.95296 
1.87883 | 5.94138 


1.88149 
1.88414 
1.88680 


1.88944 | 5.97495 
1.89209 | 5.98531 
1.89473 | 5.99166 


1.89737 | 6.00000 


13.0321 | 1.90000 | 6.00833 
13.1044 | 1.90263 | 6.01664 
13.1769 | 1.90526 | 6.02495 


13.2496 | 1.90788 | 6.03324 
13.322; 1.91050 | 6.04152 
13.5956 | 1.91311 | 6.04979 


9.0000 | 1.73205 


9.0601 | 1.73494 
9.1204 | 1.73781 
9.1809 | 1.74069 


9.2416 | 1.74356 
9.3025 | 1.74642 
9.3636 | 1.74929 


БА ЗЕ 


ооч Fak Une 


9.4249 | 1.75214 
9.4864 | 1.75499 
9.5481 | 1.75784 


9.6100 | 1.76068 


9.6721 | 1.76352 
9.7344 | 1.76635 
9.7969 | 1.76918 


9.8596 | 1.77200 
9.9225 | 1.77482 
9.9856 | 1.77764 


10.0489 | 1.78045 
10.1124 | 1.78326 
10.1761 | 1.78606 


—] 
10.2400 | 1.78885 
| 1 
10.3041 | 1.79165 5.66569 
10.3684 | 1.79444 5.67450 
10.4329 | 1.79722 5.68331 


ооч алж 


© 
© 


---| u| ooo ооо 


CON oos ane 


DARA AAD 


13.4689 | 1.91572 | 6.05805 
13.5424 | 1.91833 | 6.06630 
15.6161 | 1.92094 | 6.07454 


13.6900 | 1.92354 | 6.08276 


13.7641 | 1.92614 | 6.09098 
13.8584 | 1.92873 | 6.09918 
15.9129 | 1.93132 | 6.10737 


13.9876 | 1.93391 | 6.11555 
14.0625 | 1.93649 | 6.12372 
14.1376 | 1.93907 | 6.13188 


14.2129 | 1.94165 | 6.14003 
14.2884 | 1.94422 | 6.14817 
14.3641 | 1.94679 | 6.15630 


14.4400 | 1.94936 | 6.16441 


14.5161 | 1.95192 | 6.17252 
14.5924 | 1.95448 | 6.18061 
14.6689 | 1.95704 | 6.18870 


14.7456 | 1.95959 | 6.19677 
14.8225 | 1.96214 | 6.20484 
14.8996 | 1.96469 | 6.21289 


14.9769 | 1.96723 
15.0544 | 1.96977 
15.1321 | 1.97231 


15.2100 | 1.97484 | 6.24500 


15.2881 | 1.97737 | 6.25300 
15.3664 | 1.97990 | 6.26099 
15.4449 | 1.98242 | 6.26897 


15.5256 | 1.98494 | 6.27694 
15.6025 | 1.98746 | 6.28490 
15.6816 | 1.98997 | 6.29285 


15.7609 | 1.99249 | 6.30079 
15.8404 | 1.99499 | 6.30872 
12.1801 | 1.86815 5.90762 15.9201 | 1.99750 | 6.31664 
12.2500 | 1.87085 5.91608 16.0000 | 2.00000 | 6.32456 
м: VN nN? VN 


CaN бак ave 


© 


10.4976 | 1.80000 5.69210 
10.5625 | 1.80278 5.70088 
10.6276 | 1.80555 5.70964 


10.6929 | 1.80831 5.71839 
10.7584 | 1.81108 5.72713 
10.8241 | 1.81584 5.73585 
———— 


эмм Мммм а бу 
Sat ааа dS 


Qh 


—— IL 
1.81659 | 5.74456 


© 


анай 
1.81934 | 5.75326 
1.82209 | 576194 
1.82483 | 5.77062 


1.82757 5.77927 
1.83050 | 5.78792 
1.85305 5.79655 


1.83576 | 5.80517 
1.83848 | 5.81378 
1.84120 | 5.82257 


oe 
11.5600 | 1.84391 5.83095 


11.6281 | 1.84662 5.83 
11.6964 | 1.84932 584806 
11.7649 | 1.85205 5.85662 


11.8336 | 1.85472 5.81 
11.9025 | 1.85742 5.89215 
11.9716 | 1.86011 5.88218 


12.0409 | 1.86279 5.8901 
12.1104 | 1.86548 589007 


паміма чы аса E9 | мы асы иие 


mmm] & 
ase 


CaN Ana 


ооо ооо 2 оо wim 


аа gi шшш 

nhan вв nak | 

$3 сак ast 
OO A= 


“ 
КА 
© 
Wan 


Pl corer сыя crore | шша шшш 


S| ооо 
e 


APPENDIX 437 
Table G. Squares and square roots (continued) 


_ъ_] Ун 7 | 
16.0000 | 2.00000 6.32456 a 20.2500 | 2.12132 | 6.70820 


16.0801 | 2.00250 | 6.33246 Я 20.3401 | 2.12368 | 6.71565 
16.1604 | 2.00499 | 6.54035 Ы 20.4304 | 2.12605 | 6.72509 
16.2409 | 2.00749 | 6.54825 P 20.5209 | 2.12838 | 6.73055 


16.3216 | 2.00998 | 6.35610 Ы 20.6116 | 2.15075 | 6.75795 
16.4025 | 2.01246 | 6.56396 x 20.7025 | 2.13307 | 6.74537 
16.4836 | 2.01494 | 6.57181 Ы 20.7936 | 2.13542 | 6.75278 


16.5649 | 2.01742 | 6.57966 a 20.8849 | 2.13776 | 6.76018 


16.6464 | 2.01990 | 6.38749 a 20.9764 | 2.14009 | 6.76757 
16.7281 | 2.02237 | 6.39531 d 21.0681 | 2.14243 | 6.77495 


16.8100 | 2.02485 | 6.40312 Я 21.1600 | 2.14476 | 6.78255 


16.8921 | 2.02731 | 6.41095 x 21.2521 | 2.14709 | 6.78970 
16.9744 | 2.02978 | 6.41872 21.3444 | 2.14942 | 6.79706 
17.0569 | 2.03224 | 6.42651 ү 21.4369 | 2.15174 | 6.80441 


21.5296 | 2.15407 | 6.81175 
21:6225 | 2.15639 | 6.81909 
21.7156 | 2.15870 | 6.82642 


21.8089 | 2.16102 | 6.83374 
21.9024 | 2.16333 | 5.84105 
21.9961 | 2.16564 | 6.84856 


22.0900 | 2.16795 | 6.85565 


22.1841 | 2.17025 | 6.86294 
22.2784 | 2.17256 | 6.87025 
22.5729 | 2.17486 | 6.87750 


22.4676 | 2.17715 | 6.88477 
22.5625 | 2.17945 | 6.89202 
22.6576 | 2.18174 | 6.89928 


17.1396 | 2.03470 | 6.43428 
17.2225 | 2.03715 | 6.44205 
17.5056 | 2.03961 | 6.44981 


17.3889 | 2.04206 | 6.45755 
17.4724 | 2.04450 | 6.46529 
17.5561 | 2.04695 | 6.47302 


17.6400 | 2.04939 | 6.48074 


17.7241 | 2.05183 | 6.48845 
17.8084 | 2.05426 6.49615 
17.8929 | 2.05670 | 6.50384 


17.9776 | 2.05913 | 6.51155 
18.0625 | 2.06155 | 6.51920 
18.1476 | 2.06398 | 6.52687 


18.2329 | 2.06640 | 6.53452 
18.3184 | 2.06882 | 6.54217 
18.4041 | 2.07123 | 6.54981 


18.4900 | 2.07364 | 6.55744 


18.5761 | 2.07605 | 6.56506 
18.6624 | 2.07846 | 6.57267 
18.7489 | 2.08087 | 6.58027 


18.8356 | 2.08327 | 6.58787 
18.9225 | 2.08567 | 6.59545 
19.0096 | 2.08806 | 6.60305 


19.0969 | 2.09045 | 6.61060 
19.1844 | 2.09284 | 6.61816 
19.2721 | 2.09523 | 6.62571 


19.3600 | 2.09762 | 6.63325 


19.4481 | 2.10000 | 6.64078 
19.5364 | 2.10238 | 6.64851 
19.6249 | 2.10476 | 6.65582 


19.7136 | 2.10713 | 6.66333 
19.8025 | 2.10950 | 6.67085 
19.8916 | 2.11187 | 6.67832 


19.9809 | 2.11424 | 6.68581 
20.0704 | 2.11660 | 6.69328 
20.1601 | 2.11896 | 6.70075 


20.2500 | 2.12132 | 6.70820 l 25.0000 2.23607 7.07107 
i 25 
VN Ne VN 


aa 


Bib Bie 
3|$89 Baz 


A 
» 


Aas ANS 
Oak ane 


22.7529 | 2.18405 | 6.90652 
22.8484 | 2.18632 | 6.91375 
22.9441 | 2.18861 | 6.92098 


23.0400 | 2.19089 | 6.92820 
23.1361 | 2.19517 | 6.95542 


23.2324 | 2.19545 | 6.94262 
25.5289 | 2.19773 | 6.94982 


эз aah ммм 


4.2 
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4.2. 
4.21 
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4.2 
4.2 
4.2! 


Cen 
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& 
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ә 
е 
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23.4256 | 2.20000 6.95701 
23.5225 | 2.20227 | 6.96419 
23.6196 | 2.20454 | 6.97137 


23.7169 | 2.20681 6.97854 
23.8144 | 2.20907 | 6.98570 
25.9121 | 2.21133 | 6.99285 


24.0100 | 2.21359 | 7.00000 


24.1081 | 2.21585 | 7.00714 
24.2064 | 2.21811 | 7.01427 
24.3049 | 2.22036 7.02140 


24.4036 | 2.22261 7.02851 
24.5025 | 2.22486 7.03562 
24.6016 | 2.22711 7.04273 


A han peal Se 
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wae t оо t 
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24.7009 | 2.22935 7.04982 
24.8004 | 2.25159 7.05691 
24.9001 | 2.23383 7.06399 
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Table G. Squares and square roots (continued) 


VN 
2.23607 | 7.07107 


2.23830 | 7.07814 
2.24054 | 7.08520 
2.24277 | 7.09225 


YN | Vion 
2.34521 | 7.41620 


2.54734 | 7.42294 
2.54947 | 7.42967 
2.55160 | 7.43640 


30.6916 | 2.35372 | 7.44312 
30.8025 | 2.35584 | 7.44983 
30.9136 | 2.35797 | 7.45654 


31.0249 | 2.36008 | 7.46324 
31.1364 | 2.36220 | 7.46994 
51.2481 | 2.36432 | 7.47663 


31.3600 | 2.36643 | 7.48331 


31.4721 | 2.36854 | 7.48999 
31.5844 | 2.37065 | 7.49667 
31.6969 | 2.37276 | 7.50333 


31.8096 | 2.37487 | 7.50999 
31.9225 | 2.37697 | 7.51665 
52.0556 | 2.37908 | 7.52330 


32.1489 | 2.38118 | 7.52994 
32.2624 | 2.38328 | 7.53658 
32.3761 | 2.38537 | 7.54321 


32.4900 | 2.38747 | 7.54983 
32.6041 | 2.38956 7.55645 
32.7184 | 2.39165 | 7.56307 
32.8329 | 2.39374 | 7.56968 


52.9476 | 2.59585 7.57628 
33.0625 | 2.39792 7.58288 
33.1776 | 2.40000 | 7.58947 


33.2929 | 2.40208 | 7.59605 
33.4084 | 2.40416 | 7.60263 
33.5241 | 2.40624 7.60920 


33.6400 | 2.40832 7.61577 


33.7561 | 2.41039 7.62234 
33.8724 | 2.41247 7.62889 
33.9889 | 2.41454 7.63544 


34.1056 | 2.41661 7.64199 
34.2225 | 2.41868 7.64853 
34.3396 | 2.42074 | 7.65506 


34.4569 | 2.42281 7.66159 
34.5744 | 2.42487 | 7.66812 
34.6921 | 2.42693 | 7:67463 


34.8100 | 2.42899 | 7.68115 


34.9281 | 2.43105 | 7.68765 
35.0464 | 2.43311 | 7.69415 
35.1649 | 2.43516 | 7.70065 


35.2836 | 2.43721 | 7.70714 
35.4025 | 2.43926 | 7.71362 
55.5216 | 2.44131 | 7.72010 


35.6409 | 2.44336 7.72658 
35.7604 | 2.44540 7.73305 
35.8801 | 2.44745 7.7395] 


=ч oe Kies 
30.2500 | 2.34521 | 7.41620 -l 2.44949 
N: VN VN 


afa 
e 


2.24499 | 7.09930 
2.24722 | 7.10634 
2.24944 | 7.11337 


25.7049 3 7.12039 
25.8064 Я 7.12741 
25.9081 i 7.15442 


26.0100 x 7.14143 


26.1121 s 7.14843 
26.2144 .262 7.15542 
26.3169 .2 7.16240 


26.4196 7.16938 
26.5225 7.17635 
26.6256 | 2.27156 | 7.18331 


26.7289 | 2.27376 | 7.19027 
26.8324 | 2.27596 | 7.19722 
26.9361 | 2.27816 | 7.20417 


27.0400 | 2.28035 7.21110 


27.1441 | 2.28254 7.21803 
27.2484 | 2.28473 7.22496 
27.5529 | 2.28692 7.23187 


27.4576 | 2.28910 7.23878 
27.5625 | 2.29129 7.24569 
27.6676 | 2.29347 7.25259 


27.7729 | 2.29565 7.25948 
27.8784 | 2.29783 7.26636 
27.9841 | 2.30000 7.27324 


28.0900 | 2.30217 7.28011 
28.1961 | 2.30434 7.2869! 

28.3024 | 2.30651 7.29385 
28.4089 | 2.30868 7.30068 


28.5156 | 2.31084 7.307. 
28.6225 | 2.31301 75145) 
28.7296 | 2.31517 7.52120 


28.8369 | 2.31733 7.32803 
28.9444 | 2.31948 7.53485 
29.0521 | 2.32164 7.54166 


29.1600 | 2.32379 
eee Ва 
29.2681 | 2.32594 | 7.35527 
29.5764 | 2.32809 | 736206 
29.4849 | 2.53024 | 736885 


29.5936 | 2.33238 7.3; 
29.7025 | 2.33452 23821 
29.8116 | 2.33666 7.38918 


29.9209 | 2.33880 7.3959. 
30.0304 | 2.34094 740270 
30.1401 | 2.34307 7.40945 


ooo ooo boc 
mamn ^on man |o 
Hint dininin nio 
$95 AnA Whe 


c0 AnH 


e 


eoo corer] 9 | лош 


mn Mam ооо | g 
An an 
ӯ 88 


Bee bE LEE] RB 


я | тоот 
8 WON AAR Whe 
3|28 


У oXuu[alao 


юю bbb bii 
SEN SRP RSE 
ama mann non]! [ш 
SOS Qa 
ааа ane 


5. 
5. 
5. 
5. 
5. 
5. 
5 
5 
5. 


Ууу 
CoN 


e 
олоо | бх 
ообо боёоо | & 


man Maa! я 
OUR Wee 


ы ышы аһыы | 
Sal age eee 


CaN 


| | шы 
panj njana ono 


bio] & | mmm 
a © 


сок abe 
ON OUR ANE 


AA man aaa]! a! nae 
ban han m 

ON 

сыл mmo 

wo o0 


b 
io 
© 


APPENDIX 439 


Table G. Squares and square roots (continued) 


N: VN N? VN 
36.0000 | 2.44949 | 7.74597 es 42.2500 | 2.54951 | 8.06226 


36.1201 | 2.45153 | 7.75242 x 42.3801 8.06346 
36.2404 | 2.45357 | 7.75887 e 104 8.07465 
36.3609 | 2.45561 | 7.76531 x 8.08084 


36.4816 | 2.45764 | 7.77174 Я 8.08705 
36.6025 | 2.45967 | 7.77817 z 8.09521 
36.7236 | 2.46171 | 7.78460 x x x 8.09938 


36.8449 | 2.46374 | 7.79102 Я 45.1649 8.10555 
36.9664 | 2.46577 | 7.79744 А 45.2964 8.11172 
37.0881 | 2.46779 | 7.80385 x 43.4281 s 8.11788 


37.2100 | 2.46982 | 7.81025 E 43.5600 | 2.56905 | 8.12404 


37.3321 | 2.47184 | 7.81665 X 43.6921 | 2.57099 | 8.15019 
37.4544 | 2.47386 | 7.82304 х 43.8244 | 2.57294 | 8.13634 
37.5769 | 2.47588 | 7.82945 X 43.9569 | 2.57488 | 8.14248 


37.6996 | 2.47790 | 7.83582 .6 44.0896 | 2.57682 | 8.14862 
37.8225 | 2.47992 | 7.84219 i 44.2225 | 2.57876 | 8.15475 
37.9456 | 2.48193 | 7.84857 Я 44.5556 | 2.58070 | 8.16088 


58.0689 | 2.48395 | 7.85493 ы 44.4889 | 2.58265 | 8.16701 
38.1924 | 2.48596 | 7.86130 H 44.6224 457 | 8.17515 
38.3161 | 2.48797 | 7.86766 i 44.7561 650 | 8.17924 


38.4400 | 2.48998 | 7.87401 A 44.8900 | 2.58844 | 8.18555 


38.5641 | 2.49199 | 7.88036 45.0241 | 2.59037 | 8.19146 
38.6884 | 2.49399 | 7.88670 45.1584 | 2.59230 | 8.19756 
38.8129 | 2.49600 | 7.89305 45.2929 | 2.59422 | 8.20366 


38.9376 | 2.49800 | 7.89957 45.4276 | 2.59615 | 8.20975 
39.0625 | 2.50000 | 7.90569 45.5625 | 2.59808 | 8.21584 
39.1876 | 2.50200 | 7.91202 45.6976 | 2.60000 | 8.22192 


45.8329 | 2.60192 | 8.22800 
45.9684 | 2.60384 | 8.25408 
46.1041 | 2.60576 | 8.24015 


46.2400 | 2.60768 | 8.24621 


46.5761 | 2.60960 | 8.25227 
46.5124 | 2.61151 | 8.25833 
46.6489 | 2.61345 | 8.26438 


46.7856 | 2.61534 | 8.27043 
46.9225 | 2.61725 | 8.27647 
47.0596 | 2.61916 | 8.28251 


47.1969 | 2.62107 | 8.28855 
47.3344 | 2.62298 | 8.29458 
47.4721 | 2.62488 | 8.30060 


47.6100 | 2.62679 | 8.50662 


47.7481 | 2.62869 | 8.31264 
47.8864 | 2.63059 | 8.31865 
48.0249 | 2.63249 | 8.52466 


48.1636 | 2.63439 | 8.35067 
48.3025 | 2.63629 | 8.33667 
48.4416 | 2.63818 | 8.54266 


48.5809 | 2.64008 | 8.54865 
48.7204 | 2.64197 | 8.35464 


[m 
WON Oo 


& BREJE 


39.3129 | 2.50400 | 7.91833 
39.4384 | 2.50599 | 7.92465 
39.5641 | 2.50799 | 7.95095 


39.6900 | 2.50998 | 7.93725 


39.8161 | 2.51197 | 7.94555 
39.9424 | 2.51396 | 7.94984 
40.0689 | 2.51595 | 7.95613 


„1956 | 2.51794 | 7.96241 
40.3225 2.51992 | 7.96869 
40.4496 | 2.52190 | 7.97496 


.5769 | 2.52389 | 7.98123 
40.7044 2.52587 | 7.98749 
40.8321 | 2.52784 | 7.99375 


2.52982 | 8.00000 


41.0881 | 2.53180 | 8.00625 
41.2164 | 2.53377 | 8.01249 
41.3449 | 2.53574 | 8.01873 


41.4736 | 2.53772 | 8.02496 
41.6025 | 2.53969 | 8.03119 
41.7316 | 2.54165 | 8.03741 


41.8609 | 2.54362 | 8.04363 
41.9904 | 2.54558 | 8.04984 
42.1201 | 2.54755 | 8.05605 48.8601 | 2.64386 | 8.36062 


42.2500 | 2.54951 | 8.06226 49.0000 | 2.64575 | 8.36660 
‚2225060. Fen | 49,0000. | 2.62972) 
N: VN N: VN 
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Table G. Squares and Square roots (continued) 


N? VN 
—— 

49.0000 | 2.64575 8.56660 

49.1401 | 2.64764 8.57257 


49.2804 8.37854 
49.4209 8.38451 


49.5616 8.59047 
49.7025 8.39645 
49.8456 | 2.65707 8.40238 


N? VN 
56.2500 | 2.73861 | 8.66025 


56.4001 | 2.74044 | 8.66603 
56.5504 | 2.74226 | 8.67179 


afa 


RI 
© 


ммм 
irinin 
Ahm 


56.7009 | 274308 | 8.67756 


56.8516 | 2.74591 | 8.68332 
57.0025 | 2.74773 | 8.68907 
57.1536 | 2.74955 | 8.69483 


57.3049 | 2.75136 | 8.70057 
57.4564 | 2.75318 | 8.70632 
57.6081 | 2.75500 | 8.71206 


57.7600 | 2.75681 | 8.71780 


57.9121 | 2.75862 | 8.72353 
58.0644 | 2.76043 | 8.72926 
58.2169 | 2.76225 | 8.73499 


58.3696 | 2.76405 | 8.74071 
58.5225 | 2.76586 | 8.74645 
58.6756 | 2.76767 | 8.75214 


58.8289 | 2.76948 | 8.75785 
58.9824 | 2.77128 | 8.76356 
59.1361 | 2.77308 | 8.76926 


59.2900 | 2.77489 | 8.77496 
59.4441 | 2.77669 | 8.78066 
59.5984 | 2.77849 | 8.78635 
59.7529 | 2.78029 | 8.79204 


59.9076 | 2.78209 | 8.79773 
60.0625 | 2.78388 | 8.80341 
60.2176 | 2.78568 | 8.80909 


60.5729 | 2.78747 | 8.81476 
60.5284 | 2.78927 | 8.82043 
60.6841 | 2.79106 | 8.82610 


60.8400 | 2.79285 | 8.83176 


60.9961 | 2.79464 | 8.83742 
61.1524 | 2.79643 | 8.84308 
61.3089 | 2.79821 | 8.84873 


61.4656 | 2.80000 | 8.85438 
61.6225 | 2.80179 | 8.86002 
61.7796 | 2.80357 | 8:86566 


61.9569 | 2.80535 | 8.87130 
62.0944 | 2.80713 | 8.87694 
62.2521 | 2.8C891 | 8.88257 


62.4100 | 2.81069 | 8.88819 


62.5681 | 2.81247 | 8.89382 
62.7264 | 2.81425 | 8.89944 
62.8849 | 2.81603 | 8.90505 


63.0436 | 2.81780 | 8.91067 
63.2025 | 2.81957 | 8.91628 
63.3616 | 2.82135 | 8.92188 


63.5209 | 2.82312 | 8.92749 
65.6804 | 2.82489 | 8.93308 
65.8401 | 2.82666 | 8.93868 


64.0000 2.82843 8.94427 
N? VN 


ммм 
DM 
оо ba 


49.9849 | 2.65895 8.40833 
50.1264 | 2 8.41427 
50.2681 | 2.66271 8.42021 


50.4100 | 2.66458 8.42615 
50.5521 | 2.66646 8.43208 


50.6944 | 2.66853 8.45801 
50.8369 | 2.67021 8.44393 


irinin 
Qo 


e 


ane 
Aaa] a 
ane 


50.9796 | 2.67208 8.44985 
51.1225 | 2.67395 8.45577 
51.2656 | 2.67582 8.46168 


51.4089 | 2.67769 8.46759 
51.5524 | 2.67955 8.47349 
51.6961 | 2.68142 8.47939 


51.8400 | 2.68328 8.48528 
51.9841 | 2.68514 8.49117 
52.1284 | 2.68701 8.49706 
52.2729 | 2.68887 8.50294 


52.4176 | 2.69072 8.50882 
52.5625 | 2.69258 8.51469 
52.7076 | 2.69444 8.52056 


52.8529 | 2.69629 8.52643 
52.9984 | 2.69815 8.53229 
53.1441 | 2.70000 8.53815 


53.2900 | 2.70185 8.54400 


2.70370 | 8.54985 
2.70555 8.55570 
2.70740 8.56154 


2.70924 8.56738 
2.71109 8.57321 
2.71293 8.57904 


2.71477 8.58487 
2.71662 8.59069 
2.71846 8.59651 


2.72029 
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2.72213 8.60814 
2.72397 8.61394 
2.72580 8.61974 


2.72764 | 8.62554 
2.72947 EGER 
2.73130 8.63713 


2.73313 | 8.64292 
2.73496 | 864870 
2.73679 | 8.65448 


56.2500 2.73861 8.66025 
hg VN 
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Table G. Squares and square roots (continued) 


V10N VN 
64.0000 | 2.82843 | 8.94427 ij 2.91548 


64.1601 3 8.94986 2.91719 
64.5204 6 | 8.95545 2.91890 
64.4809 А 8.96105 2.92062 


64.6416 E 8.96660 
64.8025 8372 8.97218 
64.9636 | 2.85 8.97775 
5.1249 Д 8.98552 
.84253 | 8.98888 
65.4481 д 8.99444 


65.6100 | 2. 9.00000 


65.7721 А 9.00555 
65.9544 ^ 9.01110 
66.0969 ы 9.01665 


66.2596 | 2.85: 9.02219 
66.4225 9.02774 
66.5856 | 2. 9.03327 

66.7489 | 2.85832 | 9.03881 75.1689 
66.9124 | 2.86007 | 9.04454 al 75.5424 
67.0761 | 2.86182 | 9.04986 E 75.5161 
67.2400 | 2.86356 | 9.05559 75.6900 


31 | 9.06091 Е 75.8641 | 2.95127 
674041 | 2565). 9.06642 i: 76.0384 | 2.95296 


a 
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Oo ON 

ю 
o 


73.4449 
73.6164 


ююю юы 


ою ooc 
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73.9600 


74.1321 .95428 | 9.27901 
74.5044 Д 9.28440 
74.4769 д 9.28978 


74.6496 
74.8225 
74.9956 


=|осо ooo occ 
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67.5684 | 2.86705 | 307193 73 | 76.2129 | 2.95466 


p 7 07744 74 | 76.3876 | 2.95635 | 9. 
67.8976 | 287054 | 9.02795 "75 | 76.5625 | 2.95804 | 9.35414 


bisi 
ane 


67.7329 | 2.86880 


ovis 


228 | 9. S : 
68.0029 287302 | 9.08845 776 | 76.7576 | 2.95973 | 9.55949 


68.2276 

7576 | 9.093 77 | 76.9129 | 2.96142 | 9.36483 
2.87276 | 9: 78 | 77.0884 | 2.96311 | 9:37017 
2:87924 | 9. "79 | 77.2641 | 2.96179 | 9.57550 
2.88097 77.4400 | 2.96648 | 9.38085 
2.88271 | 9.11592 ві | 77.6161 | 2.96816 | 9.58616 
2.88344 | 9.12140 182 | 77.7924 | 2.96985 | 9.39149 
2:88617 | 9.12688 "83 | 77.9689 | 2.97153 | 9.39681 


оч алж 


[S 


.88791 | 9.13236 .84 | 78.1456 | 2.97321 | 9.40213 
62,8530 2.85964 | 9.13783 78.3225 | 2.97489 | 9.40744 
722 78.4996 | 2.97658 | 9.41276 


69.8896 | 2.89137 | 9.14550 

2.89510 | 9.14877 78.6769 | 2.97825 | 9.41807 
70-0564 | 2.89482 | 9.15425 78.8544 | 2.97995 | 9.42338 
й 79.0321 | 2.98161 | 9.42868 


20.3921 | 2.89655 | 9.15969 
70.5600 | 2.89828 | 9.16515 79.2100 | 2.98329 | 9.43398 
79.3881 | 2.98496 | 9.43928 


70.7281 | 2.90000 | 9.17061 
70.8964 | 2.90172 | 9.17606 79.5664 | 2.98664 | 9.44458 
79.7449 | 2.98831 | 9.44987 


21.0649 | 2.90545 | 9.18150 
71.2336 | 2.90517 | 9.18695 79.9236 | 2.98998 | 9.45516 
71.4025 | 2.90689 | 9.19239 80.1025 | 2.99166 | 9.46044 

80.2816 | 2.99333 | 9.46573 


71.5716 | 2.90861 | 9.19783 

71.7409 | 2.91033 | 9.20326 80.4609 | 2.99500 | 9.47101 

71.9104 | 2.91204 | 9.20869 80.6404 | 2.99666 | 9.47629 
80.8201 | 2.99833 | 9.48156 


72.0801 | 2.91376 | 9.21412 
72.2500 | 2.91548 | 9.21954 81.0000 | 3.00000 | 9.48683 
0 | ——— — 
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Table G. Squares and square roots (continued) 


Nt VN V10N VN 10N 
81.0000 | 3.00000 | 9.48683 90.2500 | 3.08221 | 9.74679 


81.1801 | 3.00167 | 9.49210 
81.5604 | 3.00333 | 9.49737 
81.5409 | 5.00500 | 9.50263 


e 


90.4401 | 5.08383 | 9.75192 
90.6304 | 3.08545 | 9.75705 
90.8209 | 3.08707 | 9.76217 


пі | {д 
аю 


81.7216 | 5.00666 | 9.50789 
81.9025 | 5.00852 | 9.51515 
82.0856 | 3.00998 | 9.51840 


82.2649 | 5.01164 | 9.52365 
82.4464 | 3.01350 | 9.52890 
82.6281 | 3.01496 | 9.53415 


82.8100 | 3.01662 | 9.53939 


82.9921 | 3.01828 | 9.54463 
83.1744 | 3.01993 | 9.54987 
85.3569 | 3.02159 | 9.55510 


83.5396 | 3.02324 | 9.56033 
83.7225 | 3.02490 | 9.56556 
83.9056 | 3.02655 | 9.57079 


84.0889 | 3.02820 | 9.57601 
84.2724 | 3.02985 | 9.58123 
84.4561 | 3.03150 | 9.58645 


84.6400 | 3.03315 | 9.59166 


91.0116 | 3.08869 | 9.76729 
3.09031 | 9.77241 
3.09192 | 9.77753 


91.5849 | 3.09354 | 9.78264 
91.7764 | 3.09516 | 9.78775 
91.9681 | 3.09677 | 9.79285 


92.1600 | 3.09839 | 9.79796 


92.3521 | 3.10000 | 9.80306 
92.5444 | 3.10161 | 9.80816 
92.7369 | 3.10322 | 9.81326 


92.9296 | 3.10483 | 9.81835 
93.1225 | 3.10644 | 9.82344 
93.3156 | 3.10805 | 9.82853 


93.5089 | 3.10966 | 9.83362 
93.7024 | 3.11127 | 9.83870 
93.8961 | 3.11288 | 9.84378 


94.0900 | 3.11448 | 9.84886 
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84.8241 | 3.03480 | 9.59687 B 94.2841 | 3.11609 | 9.85393 
85.0084 | 3.03645 | 9.60208 5 94.4784 | 3.11769 | 9.85901 
85.1929 | 3.05809 | 9.60729 К 94.6729 | 3.11929 | 9.86408 


85.5776 | 3.03974 | 9.61249 9. 94.8676 | 3.12090 | 9.86914 


85.5625 | 3.04138 | 9.61769 s 95.0625 | 3.12250 | 9.87421 
85.7476 | 3.04302 | 9.62289 А 95.2576 | 3.12410 | 9.87927 
85.9529 | 3.04467 | 9.62808 b 95.4529 | 3.12570 | 9.88433 
86.1184 | 3.04631 | 9.63328 á 95.6484 | 3.12730 | 9.88939 
86.5041 | 3.04795 | 9.63846 Ё 95.8441 | 3.12890 


9.89444 
86.4900 | 5.04959 | 9.64365 96.0400 | 3.13050 


9.89949 
86.6761 | 3.05123 | 9.64883 А 96.2361 | 3.13209 | 9.90454 
86.8624 | 3.05287 | 9.65401 5 96.4324 | 3.13369 | 9.90959 
87.0489 | 3.05450 | 9.65919 , 96.6289 | 3.13528 


9.91464 
87.2356 | 3.05614 | 9.66437 К 96.8256 | 3.13688 | 9.91968 
87.4225 | 3.05778 | 9.66954 К 97.0225 | 3.13847 | 9.92472 
87.6096 | 3.03941 | 9.67471 4 97.2196 | 3.14006 


9.92975 

Et 05108 9:607986. К 27009 3.14166 | 9.93479 
E Ў z s 97.614 3.1 

88.1721 | 3.06431 | 9.69020 А 97.8121 4525 | 9.95982 


3.14484 | 9.94485 
88.3600 | 5.06594 | 9.69536 E 98.0100 


3.14643 | 9.94987 
88.5481 | 3.06757 | 9.70052 Д 98.2081 | 3.1 
88.7364 | 3.06920 | 9.70567 К 98.4064 | 3.14502 955490 
88.9249 | 3.07083 | 9.71082 К 98.6049 d 


3.15119 9.96494 
89.1136 | 3.07246 | 9.71597 X 98.8036 | 3.152 
89.3025 | 3.07409 | 9.72111 К 99.0025 515426 926295 
89.4916 | 3.07571 | 9.72625 K 99.2016 if 


3.15595 9.97998 
89.6809 | 3.07734 | 9.73139 i 99.4009 | 3.157 
89.8704 | 3.07896 | 9.73653 i: 99.6004 3:1891 S dod 
90.0601 | 3.08058 | 9.74166 Y 99.8001 : 


3.16070 | 9.99500 
90.2500 | 5.08221 | 9.74679 100.000 3.16228 10.0000 
Wa VN V10N N: VN 


Alienation, coefficient of, 127-128 
Analysis of variance, 252-373 
applications for significance: 
of correlation, linear, 272-275, 280 
of correlation ratio, 270-271, 280 
of differences: 
for correlated means, 294-296, 
322-323, 329, 337 
for independent means, 265-269, 
321, 337 
for trends, 347, 352-356 
of interaction, 306, 312-314, 332- 
335, 337, 340 
of multiple correlation, 281-284 
of nonlinearity, 275-278, 280 
of reliability, 297, 300 
assumptions: 
homogeneity of variances, 252, 
265, 315-317, 337-338 
independent variance estimates, 
246, 252, 256 
normality, 252, 264-265, 311, 332 
violations, effect of, 252 
classifications: 
higher, 339-340 
one-way Or simple, 253, 288 
three-way OF triple, 318-323 
two-way Or double, 288, 290-294 
computation: 
groups of unequal size, 269-270 
simple classification, 265-267 
three-way classification, 323-329 
two-way classification, 301-307 
covariance method, 362-373 
computation, 368-370 
and correlation, 365-366 
degrees of freedom, 365, 368 
443 
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Analysis of variance, covariance 
method, multiple, 372 
regression adjustments, 366-368, 
371 
situations for use, 362, 371-372 
sum of products, 365 
degrees of freedom, 255, 293, 321- 
322 
error term for F, 309-315, 331-337, 
340 
factorial design, 340 
interaction, 290, 303 
higher, 340 
illustrations of, 307-309 
three-way, 321 
two-way, 306 
Latin square design, 341-344 
models, 262-263, 309-311, 331, 341 
fixed effects, 263, 309 
mixed, 309 
random, 262, 309 
pooling, 338-339 
preliminary tests, 338 
by ranks, 378-379 
significant F, meaning of, 264 
sum of squares, breakdown of, 253- 
255, 290-293, 322 
variance estimates, 89 
between-groups, 255 
expected value of, 257, 262, 310, 
311-314, 332-337, 342, 343 
interaction, 303 
meaning of, 256-264 
remainder, 293 
residual, 274, 293 
within-cells, 303, 330 
within-groups, 255 
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Arbitrary origin, 16 

Area sampling, 384 
Arkin, H., 12n 

Array, 110, 116 
Attenuation, 153-154, 208 
Attributes, 51 

Average, 1, 14-18 
Average deviation, 20 


Bartlett’s test, 249-250 
Best-fit line, 119-123 
Beta (8) coefficients, 172 
Binomial distribution, 41-46 
and chi square, 210-212 
and hypothesis testing, 46-51 
kurtosis of, 43 
mean of, 43 
and normal curve, 43—46 
and probability, 42 
skewness of, 43 
standard deviation of, 43 
Biserial correlation, 189-193 
Boneau, C. A., 106 
Brinton, W. C., 12n 
Brown-Spearman formula, 
299—300 


150, 208, 


Central value (tendency), 13 
mean, 16-18 
median, 14-15 
mode, 14 
Changes, evaluation of: 
for categorical data, 52-55, 224-226 
by covariance method, 373 
for graduated series, 76-77, 80-83, 
101-102, 373 
Chesire, L., 195n 
Chi square (x2), 198, 209 
additive property of, 222-223 
applications as test: 
of agreement with a priori fre. 
quencies, 219 
of changes, 224-226 
of correlated proportions, 224-226, 
227-228 
of correlation, 219, 221 
of goodness of fit, 220, 231-235 
of group differences, 219-224, 228. 
231 


INDEX 


Chi square (x2), applications as test: 
of independence, 219, 220-221 
assumptions, 217-219 
and binomial, 210-212 
combining of, 222-223 
continuity correction, 226-227 
degrees of freedom, 212-214, 234- 
235 
and discontinuity, 211 
distribution of, 214-217 
and F, 250 
and normal curve, 217, 250 
and null hypothesis, 216-217 
one- vs. two-tailed tests, 227 
and proportions, 224 
table of, 428-429 
and variance, 243-244 
and z, or x/e, 211, 214, 224, 225- 
226, 244, 250 
Cochran, W. G., 344 
Coded scores, 18, 22 
Colton, R. R., 12n 
Combined groups: 
mean for, 18 
standard deviation for, 24 
Common elements and correlation, 132 
Comparison of groups, 79; see also 
Significance, of differences 
Concordance coefficient, 379-381 
Confidence coefficient, 92 
Confidence interval, 89-92 
for correlation, 139 
for difference, 92-93, 104 
for mean, 92, 101 
for variance, 245-246 
Confidence level, 92 
Confidence limits, see Confidence in- 
terval 
Confounded, 333 
Contingency coefficient, 198-201 
Contingency table, 198, 219-220 
Continuity, correction for, 45, 51, 54, 
55, 226-227 
Continuous series, 5 
Correction: 
for attenuation, 153-154, 208 
for continuity, 45, 51, 54, 55, 226- 
227 


INDEX 


Correction, for grouping, 24 
for uncontrolled variable, 362-373 
Correlation and causation, 132 
Correlation between: 
categorized variables, 193-202 
dichotomized and graduated variables, 
189-193 
dichotomized variables, 193-198, 201 
gain and initial, 158—161 
indexes, 162-163 
means, 84 
point variables, 197—198 
standard deviations, 84 
sums or averages, 206-208 
Correlation: 
factors affecting: 
errors of measurement, 153-154 
heterogeneity, 144-145 
third variable, 164—168, 366 
indexes, 162-163 
part-whole, 164 
range of talent, 144-145 
sampling errors, 137-139 
selection, 136-137 
measures of: 
biserial, 189-192 
contingency, 198-201 
correlation ratio (eta), 202-203, 
278-279 
fourfold point, 197-198 
intraclass, 284-285, 299 
multiple, 169-187; see also Multi- 
ple correlation 
part, 167-168 
partial, 164-167 
point biserial, 192-193 
product moment, 112-135; see also 
Product moment correlation 
rank, 203-205, 379-381 
tetrachoric, 193-197 
Correlation ratio (eta), 202-203 
computation of, 278-279 
sampling significance of, 270-271, 280 
Correlations, averaging of, 140 А 
Covariance, 363; see also Analysis of 
variance 
Сох, G. М., 344 
Crespi, L^ 230 


445 


Critical ratio (CR, or z), 50, 54 
and chi square, 211, 224, 225-226, 
244, 250 
and F, 250-251 
and г, 99, 102-103 
Critical region, 65 
Cumulative frequency distribution, 9 
Curvilinearity, test of, 275-278, 356- 
361 


Decile, 19 
Degrees of freedom: 
for chi square, 212-214, 234-235 
for F, 247 
for t test: 
for means, 99-101, 104 
for r, 138 
for variance estimate, 99-101 
in analysis of variance, 255, 272-273, 
281, 293, 321-322 
Deming, W. E., 384n 
Differences, see Significance, of differ- 
ences 
Discontinuity, see Continuity 
Discrete series, 5, 37 
Discriminant function, 205-206 
Distribution: 
binomial, 41-46 
chi square, 214-217 
cumulative, 9 
expected, 37 
F, 247 
frequency, 6 
joint, 57 
mathematical, 37 
normal, 30-35 
observed, 37 
population, 37 
sampling, 50, 74 
t 99 
theoretical, 37 
Distribution-free methods, 374-381 
chi square as, 375 
Friedman test, 378-379 
Kendall's W, 379-381 
Kolmogorov-Smirnov test, 235 
Kruskal-Wallis test, 378 
Mann-Whitney U test, 377-378 
“median” test, 376 
sign test, 376 
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Doolittle method, 180-184 
Duncan multiple range test, 286n 


Edwards, A. L., 339n 
Elderton's table for chi square, 217 
Error: 
absolute, 145 
constant, 145 
in drawing conclusions, 63-69 
of estimate, 124—128, 173-175 
of measurement, 145-148, 296-299 
reduction, 84-85, 382-387 
relative, 146 
sampling, see Standard error 
standard, see Standard error 
type I and type II, 64-68 
variable, 145 
Estimate, error of, 124—128, 173-175 
Estimation: 
interval, 89-93 
point, 89, 241—243 
Estimator: 
consistency, 89 
efficiency, 89 
unbiased, 89, 241-243 
Eta (n), 202-203 
computation of, 278-279 
sampling significance of, 
280 
Expected value, 241, 243, 257 
Ezekiel, M., 186 


270-271, 


F, or variance ratio, 247 
and chi square, 250 
degrees of freedom, 247 
distribution, 247 
error term for, 309-315, 331-337, 
340 
for group variances, 248 
of independent estimates, 246-250 
and /, 251, 268, 274, 295 
table of, 431—433 
and z, or x/o, 250-251 
Factorial design, 340 
Fiducial limits, 92 
Finite universe, 93-94 
Fisher, R. A., 64, 139, 246, 373, 427— 
433 
Fitting of line, 119-123 


INDEX 


Form vs. form reliability, 151, 296-297, 
314-315 
Fourfold point correlation, 197-198 
Fourfold table, 53 
and changes, 53-54, 225 
chi square for, 201, 220 
and contingency, 198, 201 
exact probability for, 236-239 
and point correlation, 197—198 
and tetrachoric r, 193-197 
Frequency: 
as area, 8-9 
comparison, see Chi square 
cumulative, 9 
curve, 8 
distribution, 6 
polygon, 7 
table, 6 
Friedman test, 378-379 


Goodness of fit, 220, 231-235 
Graduated series, 5 
Graphic presentation, 7-12 
histogram, 7 
line graph, 11 
ogive, 10 
polygon, 7 
Grouping, 6 
and coding, 18 
correction for, 24 
Guessed average, 17 


Heterogeneity and correlation, 144—145, 
164—168, 366 
Histogram, 7 
Homoscedasticity, 124 
test of, 249—250 
Horst, P., 339n 
Hypotheses, 47, 61 
alternate, 61 
null, 52, 61 
one- vs. two-tailed, 61-63 
research, 61 
statistical, 61 


Independence, test of, 219 
Indexes: 
correlation of, 162—163 
mean of, 162 
standard deviation of, 162 


INDEX 


Interaction, 290, 303, 307-309, 321, 
339-340 
and correlation, 316-317 
and group profiles, 337 
and trends, 347 
Intervals, grouping, 6 
Intraclass correlation, 284—285, 299 


Joint occurrences, 56-58 


Kelley, T. L., 180, 201 

Kendall, M. G., 204 

Kendall's W (concordance), 379-381 
Kolmogorov-Smirnov test, 235 
Kruskal-Wallis test, 378 

Kurtosis, 13, 25, 26 


Latin square design, 341-344 
Level: 
of confidence, 92, 93 
of significance, 48, 63-69, 93 
Lewis, D., 357 
Lindquist, E. F., 252n 
Line graph, 11 
Linear component, 350 
Linearity of regression, 120, 128 
test for, 275-278 


McCall, W. A., 36 
Mann-Whitney U test, 377-378 
Matched groups by means of: 
matched distributions, 385 
paired cases, 82, 85, 385 
randomization, 386 
siblings and twins, 
Mean, 16-18 
for combined ар» 
computation, 16-1 
in error of, 74-15, 98, 240- 
241, 384 
Mean difference, signific 
80-83, 101-102 
Measurement: 
levels of, 374-315 : 
and permissible statistics, 375 
Measurement errors, 145-148, 296-299 
for change scores, 155-158 
for difference scores, 155-158 
effect on: 


compari 
correlation, 


82, 386-387 


18 


ance of, 76-77, 


son of means, 154-155 
153-154 
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Measurement errors, effect on: 
matching of groups, 161—162 
slopes, 155 

and regression, 158-161 

Median, 14-15 

"Median" test, 376 

Mode, 14 

Models in analysis of variance, 262-263, 

309-311, 331, 341 

Moments, 25 

Moving averages, 8 

Multiple correlation, 174-175, 178, 180 

in covariance, 372 

and determinants, 179-180 

and diminishing returns, 186 

and discriminant function, 205 
Doolittle method, 180-184 

error of estimate, 173-174 
interpretation of, 175 

limitations, 185-186 

notation, 187 

numerical solution, 180-184 
regression equations, 171-173, 177 
relative weights, 175-177 
sampling error of, 184, 281-284 
selection fallacy, 185 

and shrinkage, 184-185, 283-284 
and suppressant variable, 186-187 


Nonlinearity, test of, 275-278 
Nonparametric methods, 374-381; see 
also Distribution-free methods 
Normal correlation, 133 
Normal distribution curve, 30-35 
area under, 33 
equations for, 30 
and probability, 44-46 
table of, 33-34, 424-425 
unit form of, 30 
Norton, D. W., 252n 
Notation, 96-97, 139 
Null hypothesis, 52, 61 


Ogive, 10 
One- vs. two-tailed tests, 61-63, 105 


binomial, 48—49, 56, 211 

chi square, 207 239; 

fourfold table, 237-239 

F ratio, 248-249 
Orthogonal polynomials, 347 
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Paired cases, 82, 85, 384-385 
Parameter, 2 
Part correlation, 167-168 
Partial correlation, 165-167 
sampling error of, 167 
Part-whole correlation, 164 
Paterson, D. G., 180n 
Paull, A. E., 339 
Pearson, K., 29, 133n, 194, 217n, 239 
Percentage, see Proportion 
Percentile, 19-20 
Peters, C. C., 105 
Point biserial correlation, 192-193 
Point series (variable), 5, 37 
Polynomial forms, 361 
Power of a test, 67 
Prediction, error of, 124-128, 173-175 
Probability, 39-40 
addition theorem, 40 
approximations to, 44-46 
as area, 46 
and binomial, 41-46 
and hypothesis testing, 46-49 
of joint occurrence, 57 
as level of significance, 48 
multiplication theorem, 40 
of type I error, 64 
of type II error, 64-69 
Probable error, 96 
Product moment correlation, 112 
assumptions, 120, 124, 128-129, 131 
134-135 
computation, 112-115 
direction of, 127 
interpretations, in terms of: 
common elements, 132 
error of estimate, 124-128 
normal surface, 133 
rate of change, 124 
variance explained, 131 
limits for, 133-134, 154, 167 
sampling error of, 137-139, 272-275 
scatter diagram, 110-111, 116-119 
Profiles and interaction, 337 
Proportion, sampling error of, 50-52 
Proportions as means, 95 


Quadratic component, 357-360 
Quartile, 19 
Quartile deviation, 19 


INDEX 


Quota sampling, 384 


Random sampling, 51, 73, 382-383 
Randomization, 386 
Range, 6, 19 
Rank correlation, 203-205 
Kendall's tau, 204 
Kendall's W, 379-381 
Spearman's rho, 203 
Ranks, mean and variance of, 376-377 
Regressed scores, 160-161 
Regression, 123 
coefficient, 123 
equations, 123, 171-173 
test of linearity, 275-278 
Relative deviate, 32 
Reliability, 145-152, 296-30] 
and attenuation, 153-154, 208 
of average scores, 297-298 
of change scores, 155-158 
coefficient of, 146 
of difference scores, 155-158 
error of measurement, 147-148 
form vs. form, 151, 296-297, 314-315 
and intraclass r, 299 
range, effect on, 152 
via repeated measurements, 297-299 
significance of, 297, 300 
split-half, 150 
test-retest, 149 
Renshaw, M. J., 305 
Replication, 310 
Residuals, 130, 174, 272 


Saffir, M., 195n 
Sampling, 51, 73-74 
distribution, 50, 73-74 
binomial as, 50 
of chi square, 214-217 
empirical demonstration of, 71-73 
of F, 247 
of t, 99 
errors, reduction of, 84-85, 382-387 
for experimental and contro] groups, 
85, 384-387 
from finite universe, 93-94 
independence of units, 93, 218-219 
size required, 68, 85-86, 10g 
from skewed universe, 94-95, 107, 
252 


INDEX 


Sampling, successive, 74 
techniques, 382-384 
area, 384 
quota, 384 
random, 51, 73, 382-383 
stratified, 383-384 
systematic, 383 
theory, 51-52, 73-75 
variance, 74 
Scales of measurement, 374-375 
Scatter diagram, 110-111, 116-119 
Scheffé, H., 286, 345 
Selected contrasts or comparisons, 285— 
287, 345 
Shrinkage of multiple r, 184-185, 283- 
284 
Siegel, S., 239 
Sign test, 376 
Significance, 48 
choice of level, 63-69 
of correlation, 137-139, 272-275, 
280 
of correlation ratio, 270-271, 280 
of curvature, 356-361 
of differences: 
for changes, 86-88, 104-105 
for correlations, 139-140 
for linear trends, 352-356 


for means: 
correlated, 80-83, 101-102, 294— 


296, 322-323, 337 
independent, 83, 102-104, 265- 
269, 321, 337 
sub- vs. total group, 94 
for proportions: 
correlated, 52— 
228 
independent, 
228-231 
for regression coefficients, 140-143 
for scores, 155, 158 
for slopes, 143, 352-356 
for standard deviations, 84, 246— 
250 
for variances: 
Bartlett's test, 249-250 
correlated, 246 
independent, 246-250 
and erroneous conclusions, 64-68 


56, 224-226, 227- 


56-61, 221-224, 
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Significance, of interaction, 306, 312- 
314, 332-335, 337, 340 
levels, 48, 63-69 
of linear trend, 348-352 
of mean change, 76-77, 80-82, 101- 
102 
of multiple r, 184, 281-284 
of nonlinearity, 275-278 
of quadratic trend, 356-361 
of regression coefficient, 140—143 
of reliability, 297, 300 
of skewness, 78-79 
of slope, 140-143, 348-352 
Skewness, 13, 25, 26, 27-28 
of binomial distribution, 43 
causes of, 28 
of sampling distributions: 
of correlations, 138-139 
of proportions (or percentages), 
50-51 
of standard deviations, 99 
Small sample treatment: 
of correlation, 137-138, 140 
of differences: 
for correlated means, 101-102 
for independent means, 102-104 
for variances, 246-250 
of single mean, 101 
of variance, 245- 246 
see also Analysis of variance 
Smoothing, 8 
Snedecor, G. W., 247 
Spearman-Brown formula, 150, 208, 
299-300 
Split-half reliability, 150 
Spurious correlation, 163, 164 
Squares and square roots, 434-442 
Standard deviation, 20-25 
for combined groups, 24 
computation of, 20-23 
sampling error of, 78 
Sheppard’s correction, 24 
Standard error, 50, 74 
of average deviation, 78 
of correlation measures: 
biserial, 191, 193 
multiple, 184 
product moment, 137 
tetrachoric, 196-197 
z (transformed r), 139 
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of mean, 74-75, 240-241 
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for stratified sample, 384 
of mean difference, 77, 81 
of median, 78 
of proportion, 50-51 
from finite universe, 94 
for stratified sample, 383 
of quartile deviation, 78 
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of standard deviation, 78 
Standard error of difference: 
for changes, 86-88 
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correlated, 81 
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for medians, 84 
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correlated, 55 
independent, 60 
for scores, 155-156 
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for zs (transformed rs), 140 
Standard error of estimate, 124-128 
173-175 
Standard error of measurement, 147 
Standard score, 32, 35-37 
and T score, 36-37 
Statistic, 2 
Stratified sampling, 383-384 
"Student," 387 
Successive sampling, 74 
Sum of squares, 23, 101; see also Analy- 
sis of variance 
Suppressant variable, 186—187 
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assumptions and limitations, 105-108 
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for difference: 
in correlated correlations, 140 
in correlated means, 101-102 
in correlated variances, 246 
in independent means, 102-104 
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T scaling, 36 
Trend analysis: 
curvilinear trend, 356-361 
differences in trends, 347 
individual trend (or slope), 354 
linear trend, 348 
correlated observations, 351—352 
independent observations, 348-351 
slope differences, 352 
correlated observations, 354-356 
independent samples, 143, 352-354 
trends and interaction, 347 
True score, 146 
Two-tailed tests, 61-63 
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additive nature of, 59, 129-130 
and chi square, 243-244 
computation, 20-23 
confidence limits, 245 
and correlation, 129-131, 175 
difference between, 246-250 
of differences, 59, 81, 129-130 
estimate, 89, 241—243 
homogeneity of, 249-250 
ratio, see F 
sampling distribution of, 243-244 
of sums, 59, 129-130 
theorem, 129-130 
see also Analysis of variance 
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About the book... 


As in former editions, this well-known work Сш 
velops a concise presentation of the statistical БЕ 
niques frequently used in psychology and educa | 
The level, though designed for an intermediate. соб | 
eis not beyond the grasp of students in elementy © 
courses who have some facility in mathematical reason- ~e j 
ing. { 

: Beginning with a brief -introduction to simple 
descriptive statistics, the author quickly proceeds to 
the basic ideas of sampling errors and statistical in- 
ference as a part of the logic of hypothesis testing. 
Since so much of current psychological research de- 
pends upon correlation analysis, the following sections 
are devoted to a thorough discüssion of this concept. 
Then the major problem of inference is again taken 
up and extended by way of the chi square and analysis 
.of variance techniques, thus providing the basis for 
the designing of experiments. 
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